Reputation: 797
My original PDF file size is around 24MB, however when I encode it to based64 string, the string size is around 31MB. I'm wondering why that is.
It is easy to understand for an image file since it may lose some compression, but it also happens to PDF or some other format files?
Upvotes: 25
Views: 20935
Reputation: 1074028
just wondering why
Because Base64 has fewer meaningful bits per byte than a binary data format (usually 6 instead of 8). This is specifically so it can survive various textual transformations that binary data would not.
Wikipedia's page has a good diagram showing this:
As a text table (sadly the GitHub-flavored markdown used by SO doesn't support tables with varying numbers of columns):
+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
| Text content | M | a | n |
+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
| ASCII | 77 (0x4d) | 97 (0x61) | 110 (0x6e) |
| Bit pattern | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 |
| Index | 19 | 22 | 5 | 46 |
| Base64−encoded | T | W | F | u |
+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+
Note how the Base64 is only using the bottom six bits of each byte, and so "Man" ends up being four bytes long.
It is easy to understand for image file since it may lose some compression
Just to be clear, Base64 encoding is lossless. When you decode it, you get byte-for-byte what you started with.
Upvotes: 48