thumbnail encodingOnline, image encoding
Learning With Errors
bytes & pixels

Starter draft. This grew out of the garage encoding study, where one photo runs through every encoder and the bytes get counted. The conversation is a first pass; the demo fetches the study's real output files and measures their actual sizes.

aadharsh

the garage page runs one photo through jpeg, jpegli, and avif and prints three different sizes. before I trust those numbers, tell me why one image even gives three sizes.

thumbnail encoding

An encoder throws away the detail your eye skips, then packs what survives. The three differ in how hard they push each step.

JPEG (1992) cuts the photo into 8x8 blocks, runs a cosine transform, and rounds off the high-frequency parts. jpegli (Google, 2024) keeps the exact JPEG format but rounds smarter, so it fits roughly 25% more quality into the same bytes as the old mozjpeg encoder. AVIF borrows AV1's video tricks, larger blocks and sharper prediction, so it usually wins on bytes-per-pixel outright.

aadharsh

so jpegli writes a normal .jpg that any browser opens, just smaller?

thumbnail encoding

Right. jpegli stays inside the JPEG bitstream, so a browser from 1995 still decodes it. AVIF needs a modern decoder (Safari 16+, Chrome 85+). That trade, universal but larger against modern but smaller, is why the photo grid ships both and lets the browser pick.

aadharsh

ok, stop describing it. show me the bytes on one photo.

bytes-per-pixel, measured live

The same 400×266 photo (106,400 pixels), run through every encoder. These sizes are fetched live from the files the garage study actually produced, so the bytes are real. The last column is size against the lossless PNG.

sample photo, jpegli q82
encodersizeb/pxvs PNG
measuring real files…
aadharsh

so on this photo jpegli is about half the baseline JPEG, and avif is smaller still. what's the catch with avif then?

thumbnail encoding

Decode support and encode time. AVIF leans on the AV1 codec, which is slower to encode and only decodes on recent browsers. jpegli is fast and universal. That is why the site builds jpegli from source for the universal fallback and uses AVIF as the primary <picture> source: the browser reaches for AVIF, and anything that cannot decode it falls back to the jpegli .jpg.

aadharsh

and the grayscale shots? why are those so much smaller?

thumbnail encoding

A color image carries one luma plane (brightness) plus two chroma planes (color). Drop to grayscale and you delete both chroma planes, roughly two-thirds of the color information, so the file falls hard. The garage study has the side-by-side counts.

aadharsh

so color is two of the three planes. JPEG and AVIF don't even keep color at full resolution, right? that's the subsampling thing?

thumbnail encoding

Right, chroma subsampling. Your eye resolves brightness far better than color, so codecs keep luma at full resolution and shrink the two chroma planes. 4:4:4 keeps everything; 4:2:2 halves chroma horizontally; 4:2:0, the web default, stores one chroma sample per 2x2 block, a quarter of the color resolution. The luma carries the sharpness, so you barely notice.

chroma subsampling · pick a mode

The top band is fine luma detail (black and white lines); the bottom is fine chroma detail (red and green lines) at the same spacing. Switch the mode: luma stays razor sharp at every setting, while the color detail dissolves as you drop chroma resolution.

aadharsh

wild, the black-and-white lines stay razor sharp at 4:2:0 but the red-green lines just dissolve. and that's half the samples gone.

thumbnail encoding

Exactly, and for photographs it is almost free, because real scenes rarely put fine high-contrast color edges right next to each other. Where it shows is red text on a dark background or saturated line art, which is why screenshots and graphics keep 4:4:4 while photos ship 4:2:0.

aadharsh

earlier you said jpegli just "rounds smarter." smarter how, if it writes the same JPEG format?

thumbnail encoding

It models your eye instead of trusting the fixed 1992 quantization tables. jpegli works in the XYB color space, which spaces colors the way you perceive them, sets the quantization adaptively per block by what you would actually notice, and rounds with adaptive dead-zones. The output is still a standard JPEG that any decoder reads, but more bits land where your eye looks and fewer on detail you would never see. That is the rough 25% it gains over the old mozjpeg encoder.

aadharsh

ok, I want to SEE these tradeoffs, not just read byte counts. show me one zoomed crop across formats and qualities.

thumbnail encoding

Here is a 96-pixel detail crop, the car's body edge against the dark wheel, run through every encoder and blown up so the artifacts show. Start with format against quality.

format × quality · one zoomed crop

One 96-pixel detail crop (the body edge against the dark wheel), run through three formats at three quality tiers and shown pixel-zoomed. Read across a row to compare formats at one quality; read down to watch a format fall apart as quality drops.

highmidlow jpegli
q90
q90
q50
q50
q22
q22
WebP
q90
q90
q50
q50
q22
q22
AVIF
q78
q78
q42
q42
q18
q18

At 96 pixels the absolute bytes mostly reflect each format's fixed container overhead (WebP's is lightest, AVIF's heaviest), not the per-pixel efficiency that flips on full images, which is what the live measurement above shows. What the zoom reveals is the artifact style: JPEG breaks into 8x8 blocks, while WebP and AVIF smear and smudge instead.

aadharsh

jpeg goes blocky, the others go smeary. now the jpeg encoders you mentioned, mozjpeg vs the google one?

thumbnail encoding

Same crop, same quality knob, three encoders. The bytes under each tell the story.

three JPEG encoders · same quality

The same crop, the same quality setting (q72), three JPEG encoders. All three write the same JPEG format any decoder reads; the only difference is how cleverly each spends its bits. Watch the byte counts.

baseline
baseline
mozjpeg
mozjpeg
jpegli
jpegli

Same quality knob, real bytes measured live: jpegli comes out smallest, well under both mozjpeg and the system baseline, because it models your eye (XYB color, adaptive per-block quantization) and stops spending bits on detail you would not notice. That is the psychovisual win in one crop.

aadharsh

jpegli really is smaller at the same setting. and chroma, up close on a real edge?

thumbnail encoding

The same crop as a JPEG at the three chroma samplings. Luma holds; the color is what softens.

chroma subsampling · a real edge

Chroma subsampling on a real edge instead of stripes: the same crop saved as a JPEG at 4:4:4, 4:2:2, and 4:2:0. The luma edge holds at every setting; the color is what loses resolution.

4:4:4
4:4:4
4:2:2
4:2:2
4:2:0
4:2:0

The size drop is modest here (4:4:4 to 4:2:0) because this crop is mostly grays. On saturated content the savings grow and the color fringing gets obvious, which is the stark version in the stripes demo above.

aadharsh

so what does the site actually ship per thumbnail?

thumbnail encoding

Each thumbnail is dual-encoded: an AVIF primary and a jpegli JPEG fallback inside one <picture>, plus a 400px AVIF tier for phones. The browser loads the smallest format it can decode and never downloads the others.

aadharsh

got it. one photo, several encoders, and the grid hands each browser the cheapest thing it can read. thanks.

the full garage encoding study · back to Learning With Errors

end of first pass
This is a recorded conversation. The demo above measures real files.