Back home
A four-panel illustration; in the first panel, an alien is standing by a sleeping bag, holding its sleeve confidently. In the second panel, the alien tries to shove the sleeping bag into the sleeve, but it clearly won't fit. In the next panel, the alien has rolled up the sleeping bag and is forcefully trying to pull the sleeve over the bag. In the final panel, the alien is visibly exhausted, but has finally managed to get the sleeping bag into the sleeve.

Better than PNG

As a fun exercise, I designed a lossless image format for this website that compresses images about 30% better than PNG.

You might think to yourself: "30% better than PNG? Must not be very good then!" and honestly I'd have been right there with you before I went on this journey. PNG never performed well for me; I'm usually compressing images from PNG to another format, like WebP or AVIF. But the target images, which have some specific properties, turned out to compress surprisingly well with PNG, and I'm excited to highlight that hidden strength of PNG. But first, some context and backstory.

I should preemptively emphasize that, unless your goals include "having fun" and "learning about image compression", the practical benefits of a project like this are nowhere near worth the effort. Instead, compress your images using existing formats, like WebP or AVIF.

I own an e-ink note-taking device from Supernote. Occasionally, I use it to draw on. It also provides the option to export the files as images. These images are (on their older models) strictly four-tone; there is no anti-aliasing, every pixel is a specific tone of white, light gray, dark gray or black. Additionally, this site uses a cartoon-ish style, meaning the images contain areas of singular colors, and not so much textures, like dotting or hashing. These constraints are rather restrictive, and I thought I could construct a custom image format to compress these images losslessly to the point they rival modern lossy formats like WebP or AVIF. This was, of course, terribly naive of me, but usually that's how fun projects start.

Initial ideas

At first, I wanted to give things a shot without peeking at how other formats work, just to see how far I could get. The initial ideas were variations on the same concept; first, draw shape outlines, next, fill the enclosed areas. I iterated through several different ways of efficiently describing the outlines, as the fills were relatively straight-forward. My initial estimations of this strategy were promising. Unfortunately, I had made the fatal mistake of overlooking how "heavy" coordinates were. In my predictions for the format's efficiency, I neglected to account for the initial (x, y) coordinates for each shape's outline. And this, as it turns out, was a very significant portion of the required number of bytes. In the end, the format was not performing well, not even close to what PNG was doing, and it was needlessly difficult to turn existing images into this format efficiently.

The hidden strength of PNG

Given my initial ideas ended up being much worse than PNG, I caved and started to look into how PNG works to get some ideas. This is when I learned that PNG was performing exceptionally well for my target images.

PNG has several options and methods for different types of images. I didn't research them all; I only focused on the relevant ones. This explanation of PNG gives a general idea of what PNG can do but is not exhaustive by any means.

Broadly speaking, PNG combines three strategies for its exceeds-expectations level of compression.

Especially for my target images, these three properties make PNG unexpectedly powerful. The lack of anti-aliasing means that the guessing algorithm the transform does is very often spot-on, and the resulting repetition in the data causes DEFLATE to reduce the required number of bytes by a lot. For reference, my main test image had dimensions of 1278 by 1828 pixels, which means that without DEFLATE, the raw list of pixels (at a bit depth of 2) would be about 584kB. In reality, the optimized compressed PNG clocked in at a mere 70kB, and this includes additional metadata like the image dimensions and the color palette.

If you'd like to play around with image compression yourself, I recommend squoosh.app. It provides compression using OxiPNG, and you can "Reduce palette" to kick it into using a lower bit depth. This resulted in the 70kB for my test image.

Improving on PNG and giving up

After studying the genius behind PNG, I realized this "guessing pixels" strategy was very powerful. After all, if you could guess all the pixels right algorithmically, you wouldn't need any data to represent the image. As such, I thought I would try to expand on the guessing strategy used in PNG. I didn't really have predictions on the impact I could make, but it was worth a shot. Since PNG only looks at the pixels on the left, top, top left, and top right of the pixel being guessed, I figured I could look at pixels a bit further away. I even made a game of sorts in my terminal; showing a sizable portion of the image, up to a certain pixel that PNG gets wrong, can I myself guess it correctly? And how can I incorporate my intuition into the guessing algorithm?

After adding a few specific cases, I did end up increasing the rate at which the pixels were being guessed correctly. This resulted in winnings of about 5% over PNG. Personally, I felt unsatisfied, and found it hard to justify a custom format for an improvement of that magnitude. Especially because using a custom format requires me to ship a parser along with the images, and this is a cost that needs to be outweighed by the size reduction in the images.

At this point, I decided that my newfound respect for PNG would be the sole prize of my journey. Exhausted from my numerous attempts, I let the compression obsession die out by spending my time on other things. After a week or two, I determined it was time to write a post on my newfound knowledge on PNG. Naturally, writing about it rekindled the drive to do better, and I told myself I could give it one more try.

Introducing: the SN format

Sure enough; this final attempt paid off! While my initial implementation gave some overly promising results due to subtle bugs, the methodology was sound. After ironing out the bugs and writing converters between PNG and my new format, the target images showed to be around 25-30% smaller than PNG consistently, beating even lossy WebP at its lowest quality settings.

As a nod to Supernote, without whom I would not have tried this, I chose .sn as the file extension. For the 8 magic bytes at the start of the files, I chose to use SNxVH0.1 (in ASCII). The SN for Supernote, VH for vrugtehagel, and 0.1 as a version indicator.

The question, of course, is: how does this .sn format work? Well, fundamentally, it resembles PNG. But this is more of a conceptual likeness; even though the high-level approaches are similar, their serializations are vastly different.

To start off, unlike PNG, which lists differences between guesses and actual values, SN makes a list of all possible guesses, from best to worst, and then lists the index of the correct guess for each pixel. Next, I increased the payoff for making good guesses by, instead of having a fixed number of bits per guess (like PNG), using one additional bit for each incorrect guess. Specifically, the (zero-based) index of the correct guess is how many 1 bits are emitted, and a final 0 is added to terminate the string of 1s. For example, if the first guess is right, a single 0 is emitted; otherwise, if the second guess was correct, emit 10; if the third guess was right, emit 110; etcetera.

In practice, granted the guesses are good, this results in long string of 0s, with some intermittent 1s. This can already be compressed using a general-purpose compression method, but I wanted to do even better. The next step was to, rather than list out a bunch of zeroes, list how many zeroes there are. To illustrate the route I took to solve this with an example, let's say we're compressing a list of 23 consecutive zeroes.

It's not possible to simply use 10111 directly, because it is ambiguous; it could also signify a single incorrect guess (10) followed by 7 zeroes (111). The groups with their leading 0 bits are resolving this ambiguity.

What was originally 23 bits, is now 9 bits. Long sequences of zeroes become significantly shorter. Short sequences, on the other hand, can get a little longer; for example, a single 0 would become 001, which is three bits instead of one. This further reinforces the importance of getting guesses right, and the slight increase in bits for small numbers of zeroes is far outweighed by the savings from the large numbers of zeroes. As a happy side-effect, this method does introduce some statistical repetition in the fact that each third bit in these sequences is a zero. This in turn helps brotli or gzip perform a bit better.

I experimented with different sizes for these groups, including using incrementally more bits (e.g. a group of one bit for the first group, 2 for the second group, etcetera) but ultimately making groups of two bits seemed to perform the best.

At this point, the importance of getting guesses right has been heavily inflated. I had already found some improvements to PNG's guessing strategy, so I implemented those into the SN format to increase compression ratios even further.

That, in broad lines, is how the format fits together. There are some details I've glossed over, like the file header, containing the image dimensions and color palette. I also implemented some minor optimizations on top of the process above, like expressing the number of zeroes as 0-indexed, and excluding the terminating zero in "worst guess" scenarios. If you're curious for a more exact description of the format, I've got a public repository on GitHub, vrugtehagel/sn with an elaborate and precise specification and example implementations to convert from and to .sn.

Implementation

For the SN-to-bitmap implementation, I chose WASM (handwriting its text format, WAT), both to force myself to write reasonably efficient code as well as to reduce the overall footprint of the conversion script. The final WASM module is then inlined in a JS file to avoid an additional request, and the whole thing weighs in at well below 2kB. Thus, a page breaks even when it loads about 7 or 8 kilobytes worth of images (nearly every page does). The script is also cached across page loads, so this cost is paid only once, and all subsequent navigations benefit from the file size reductions for free.

The JavaScript mentioned takes the form of a service worker intercepting requests for .sn.png files. This extension signals that a .sn file exists, but this naming convention also allows for loading a PNG if the service worker hasn't loaded yet or is otherwise unavailable. In other words; the SN format is a progressive enhancement. Neat, right?

Results

Let me preface these results by acknowledging that writing your own compression format is not worth the effort in terms of direct practical results. Not even a little bit. I did this mostly for fun and educational purposes; the actual bytes saved are merely a proof of concept. That said, let's see some numbers, shall we? First, let's look at how different compression formats perform for this post's header image.

A chart detailing the file size of various different formats, sorted largest to smallest. The first format is JPEG at 20% quality, resulting in 21.4kB. Next is PNG at nearly half that; 12.4kB. Then comes WebP, at its lowest quality setting, coming in at 9.5kB. AVIF follows closely with 8.1kB at 20% quality. Second last is the SN format, at 7.6kB, losing only to JPEG XL with a whopping 5.1kB using its lossless mode.

Here I've chosen the quality settings for the lossy formats to be low on purpose, to give them a chance of getting close to the others. While WebP at its lowest settings is visually low quality, it still represents the image closely enough to compete as a semi-valid alternative. For AVIF and JPEG I chose to use a quality of 20%, since the images are mangled at 0% to the point they aren't a realistic competitor. At 20% they are still noticeably compressed, but presentable.

JPEG XL is brutally efficient; it performs substantially better than SN. As of the creation of the SN format, however JPEG XL is not supported by default in any major browser. Interest does seem to be increasing across the web platform, so I'm very excited for those developments, although I will admit that removing the SN format from my site, in favor of JPEG XL, will make me a little sad.

Now, let's look at a handful of pages and how much data they load both with and without the service worker intercepting image requests to use the SN format.

A chart outlining the savings for three different pages. First, the homepage is shown, which is by far the largest of the three pages, but SN made it 24% smaller. Next is /essays/notion-meets-eleventy/, which is less than half the total size, and only sees a reduction of 15%. Last is /essays/my-doubts-about-vento/, the smallest page, seeing only a 7% smaller footprint.

As expected, the homepage benefits the most as it contains by far the most imagery; a reduction of over 75kB. An image-heavy post, such as Notion meets Eleventy, still is a respectable 20kB smaller, but posts with only a header image, like My doubts about Vento, see only very minor benefits (but benefits nonetheless) of about 5kB.

Was it worth it? For me, absolutely. I've learned so much along the way and it's exciting to see it working live. Worth it for you? Almost definitely not - but in my opinion, projects are worth it if you enjoy working on them, even if they don't have much practical value.

Addendum

Early on in my website's redesign, my Supernote unfortunately broke down. At some point, a few months before even starting on the website, the screen got damaged in my backpack. After this, horizontal and vertical lines of dead pixels slowly accumulated over time, until eventually, the screen as a whole gave in. When I revisited Supernote's website, I was pleasantly surprised to learn the company had pivoted more into repairability and extensibility. Naturally, I gladly supported that mission by purchasing their latest model as a replacement for my now-unusable device. In a cruel twist of fate, the newer model did have anti-aliasing, and no way to turn it off. Not only had I already drawn a fair few images for the website's redesign, I had progressed through the vast majority of the quest to achieve better compression, all with aliased images in mind. Willingly falling deeper into the sunk-cost fallacy, I decided I'd write some additional code to post-process the anti-aliased images to remove the unwanted smoothening. This isn't a trivial problem either, but it's not a very interesting story and this tangent is too unrelated for me to discuss it here.