Log in

No account? Create an account
Gareth Rees


Previous Entry Share Next Entry
Smallest possible transparent PNG

This discussion at drj11’s blog raises the question of how small is the smallest Portable Network Graphics (PNG) image?

Starting out with general principles, it seems likely that we want the smallest possible image (PNG format doesn’t allow empty images, so it’s going to have to be 1×1), with the smallest number of colour channels (PNG supports greyscale images, with a single colour channel per pixel) and the smallest bit depth per channel (PNG supports 1-bit channels).

Let’s get going by creating a 1×1 black-and-white image in GraphicConverter and save it as PNG. The result is 73 bytes long. Is that the answer? Unfortunately it’s not easy to be sure. The PNG standard has a fair bit of flexibility, so it might be the case that GraphicConverter is being wasteful in some way. To see if this is the case, we have to look at GraphicConverter’s output in detail. Here’s a hex dump:

00000000: 8950 4e47 0d0a 1a0a 0000 000d 4948 4452  .PNG........IHDR
00000010: 0000 0001 0000 0001 0100 0000 0037 6ef9  .............7n.
00000020: 2400 0000 1049 4441 5478 9c62 6001 0000  $....IDATx.b`...
00000030: 00ff ff03 0000 0600 0557 bfab d400 0000  .........W......
00000040: 0049 454e 44ae 4260 82                   .IEND.B`.

Referring to the PNG specification, those 73 bytes break down as follows:

  • An 8-byte file signature.
  • A 13-byte IHDR chunk containing the image header, plus 12 bytes chunk overhead.
  • A 16-byte IDAT chunk containing the image data, plus 12 bytes chunk overhead.
  • A 0-byte IEND chunk marking the end of the file, plus 12 bytes chunk overhead.

You can see the location of the chunks clearly in the hex dump, because the ASCII chunk types stand out.

We can’t do anything about the size of the file signature or the IHDR and IEND chunks, or the chunk overhead, as these are fixed in size. So any possibility for reduction must lie in the IDAT chunk. This is a zlib data stream of the scanlines from the image, compressed using the “deflate” method.

Using the zlib specification (RFC 1950) we can decode this zlib stream as follows:

  • The header byte 78 meaning “deflate compression with a 32 KiB window”.
  • The informational byte 9c meaning “the default compression algorithm was used” (plus a checksum).
  • 10 bytes of compressed data: 62 60 01 00 00 00 ff ff 03 00.
  • A 4-byte Adler32 checksum: 00 06 00 05.

It’s easy to decompress the IDAT chunk in Python:

>>> import zlib
>>> import codecs
>>> idat = codecs.decode('789c626001000000ffff030000060005', 'hex')
>>> zlib.decompress(idat)

What’s this pair of bytes? The section on scanline serialization in the PNG specification explains that the first byte, 00, is the filter type for the first scanline of the image (in this case 0 means “no filtering”) and the second byte, 04, contains the packed samples for pixels in the scanline. PNG is big-endian, so the single pixel value of 0 appears in the high bit of the byte; the remaining seven bits are unused (why they contain the value 4 is a bit of a mystery).

Now we can turn to the DEFLATE Compressed Data Format Specification (RFC 1951). A DEFLATE stream is a little-endian bit stream. So let’s write it out in little-endian order:

---62--- ---60--- ---01--- ---00--- ---00--- ---00--- ---ff--- ---ff--- ---03--- ---00---
01000110 00000110 10000000 00000000 00000000 00000000 11111111 11111111 11000000 00000000
\-------first block---------/\--------------second block--------------/ \--third block--/

It decodes as follows:

First block:
0 = not final block
10* = Huffman coding block with fixed codes
00110000 = literal byte 00
00110100 = literal byte 04
0000000 = end of block
Second block:
0 = not final block
00 = non compression block
000 = padding to end of byte
00000000 00000000 = number of data bytes in the block (that is, zero)
11111111 11111111 = one’s complement of number of data bytes
Third block:
1 = final block
10* = Huffman coding block with fixed codes
0000000 = end of block
000000 = padding to end of byte

* In the DEFLATE specification the authors do their best to confuse the reader by giving the block types in big-endian order despite everything else being in little-endian order: see section 3.2.3 where block type “10” is given as “01” and vice versa.

The GraphicConverter encoding is clearly wasteful: the last two blocks contain no data and could be dropped if we made the first block the final block by changing the first byte to 63. And indeed this is just what the zlib compression library outputs:

>>> codecs.encode(zlib.compress('\x00\x04'), 'hex')

(So did Thorsten Lemke, author of GraphicConverter, also write his own DEFLATE compressor?)

Anyway, putting the whole thing together:

>>> import struct
>>> def chunk(type, data):
...     return (struct.pack('>I', len(data)) + type + data
...             + struct.pack('>i', zlib.crc32(type + data)))
>>> png = ('\x89PNG\r\n\x1A\n'
...        + chunk('IHDR', struct.pack('>IIBBBBB', 1, 1, 1, 0, 0, 0, 0))
...        + chunk('IDAT', zlib.compress(struct.pack('>BB', 0, 0)))
...        + chunk('IEND', ''))
>>> len(png)

So 67 bytes is the answer to the question we started with.

But drj11’s original question was about PNGs with an alpha channel. When an alpha channel is present, the smallest bit depth supported by the PNG format is 8 bits per channel. So a 1×1 transparent PNG has a five-byte scan line (one byte for the filter, one byte each for the red, green, blue, and alpha channels).

>>> codecs.encode(zlib.compress('\x00' * 5), 'hex')
>>> len(_) / 2

Which makes a PNG 68 bytes long. And indeed you can find the occasional claim that this is the smallest possible transparent PNG.

But in the case where the pixel is completely transparent (alpha = 0) it’s possible to shave off one more byte by exploiting a feature of the DEFLATE compression format (or rather, exploiting it slightly more cleverly than zlib is able to). DEFLATE allows you to duplicate a portion of the output stream by referring to it with a special pair <length of portion to duplicate, distance backwards>. And crucially, the referenced string is allowed to overlap with the current position. As it says in the DEFLATE spec:

if the last 2 bytes decoded have values X and Y, a string reference with <length = 5, distance = 2> adds X,Y,X,Y,X to the output stream.

So we can compress the five byte scanline by encoding a zero byte, then the string reference <length = 4, distance = 1>, like this:

1 = final block
10 = Huffman coding block with fixed codes
00110000 = literal byte 00
0000010 = duplicate string of length 4...
00000 = ... and distance 1 before current position
0000000 = end of block
00 = padding to end of byte

That packs into 4 bytes:

---63--- ---00--- ---01--- ---00---
11000110 00000000 10000000 00000000

Plus the two bytes of zlib header, 78 9c, and the Adler-32 checksum, 00 05 00 01, makes 10 bytes. Putting the whole PNG together:

>>> png = ('\x89PNG\r\n\x1A\n'
...        + chunk('IHDR', struct.pack('>IIBBBBB', 1, 1, 8, 6, 0, 0, 0))
...        + chunk('IDAT', codecs.decode('789c6300010000050001', 'hex'))
...        + chunk('IEND', ''))
>>> len(png)

So, bandwidth misers using 1×1 transparent PNGs, read this article and save a byte!


[User Picture]
Date:2007‒11‒16T09:25 (UTC)
But fun.
(Replies frozen) (Thread)
Date:2007‒11‒16T11:01 (UTC)

I created my 1x1 png with pnmtopng and it was 95 octets. I seem to have two versions of pnmtopng on my PATH and one of them seems to throw away the alpha channel (despite understanding the option that specifies the alpha channel). Glad I didn't use that one then.
(Replies frozen) (Thread)
[User Picture]
Date:2007‒11‒16T11:38 (UTC)
Yes, pnmtopng added a PLTE (palette) chunk to your image. The source says:
     We can write a palette file if maxval <= 255 and one of the following is
       - for ppm files if we have <= 256 colors
       - for alpha if we have <= 256 color/transparency pairs
       - for pgm files (with or without alpha) if the number of bits needed for
         the gray-transparency pairs is smaller than the number of bits needed
         for maxval
     When maxval > 255, we never write a paletted image.
This is a bit naive. With a truecolour image, it costs 4 bytes for each palette entry, plus 12 bytes overhead for the palette chunk. And you gain 3 bytes for each pixel before compression. If compression could be ignored then you should use a truecolour palette if 3P > 4C + 12, where P is the number of pixels and C the number of colours. Unfortunately compression can't be ignored, because the actual pixel values may compress better than their palette entries. So there's really no substitute for trying it both ways and comparing.

Trying lots of different encoding options and picking the smallest result is the kind of thing that PNGcrush and OptiPNG do, by the way.
(Replies frozen) (Parent) (Thread)
garethrees.org Powered by LiveJournal.com