Reputation: 33
I have the following base64 string:
R1NNQiBBZ2VuY3kgR21iSCAvIFdlYmRlc2lnbiBBZ2VudHVyIFVsbSAvIE9ubGluZXNob3AgQWdlbnR1ciAvIEFwcCBBZ2VudHVyIFVsbSwgR2VybWFueS==
And using an online base64 decoder I get the following result:
GSMB Agency GmbH / Webdesign Agentur Ulm / Onlineshop Agentur / App Agentur Ulm, Germany
All good, right? But now if I try to convert this text back to base64 - the result is becomes
R1NNQiBBZ2VuY3kgR21iSCAvIFdlYmRlc2lnbiBBZ2VudHVyIFVsbSAvIE9ubGluZXNob3AgQWdlbnR1ciAvIEFwcCBBZ2VudHVyIFVsbSwgR2VybWFueQ==
Any ideas?
This is the C# code I am using for decoding:
string basestring = "R1NNQiBBZ2VuY3kgR21iSCAvIFdlYmRlc2lnbiBBZ2VudHVyIFVsbSAvIE9ubGluZXNob3AgQWdlbnR1ciAvIEFwcCBBZ2VudHVyIFVsbSwgR2VybWFueS==";
string output = Encoding.UTF8.GetString(Convert.FromBase64String(basestring));
return output;
And here's the encoding part
string basestring = "GSMB Agency GmbH / Webdesign Agentur Ulm / Onlineshop Agentur / App Agentur Ulm, Germany";
string output = Convert.ToBase64String(Encoding.UTF8.GetBytes(basestring));
return output;
Upvotes: 3
Views: 737
Reputation: 4105
This is actually an artefact of moving from 8-bit encoding (UTF8) to a 6-bit encoding (Base64).
As reference, here's the Base64 encoding table
We'll take an example of the string "AB"
; A
and B
are char(65
and 66
) respectively. In 8-bit binary grouping, 65/66 are 01000001/01000010
.
When encoding to Base64, the same bits of your string are separated in groups of 6 instead of 8. So the same 16-bit sequence above are split into 010000/010100/0010
(same bit pattern, just grouped differently).
Now, the first two groups are easy. You look up the encoding table linked above, and you'll see that 010000 = Q
/ 010100 = U
. You then have the last group with only 4 bits instead of the expected 6. This is where things get interesting.
When encoding, the end is usually padded with zeroes to get to 6 bits. So your 0010
becomes 001000
which is I
. So "AB"
when encoded in Base64 become "QUI="
. The =
is optional, it's just there to make the number of characters multiples of 4.
Remember when your last group of 0010
is padded to become 6 bits? Here's the fun part: they don't have to be zeroes. The 16-bits (2x8) in your original string became 18-bits (3x6) because of the padding. Since 18 is not a multiple of 8 (bits), the encoder/decoder know enough to drop the excess bits. So the two bit padding could be anything, and they'll still decode properly.
0010
when padded could either be 001000
, 001001
, 001010
, or 001011
- which translates to I, J, K, or L
. Bring up any decoder, and try decoding QUI
, QUJ
, QUK
, and QUL
. They will all decode to "AB"
Now, your string when split 6-bit groups looks like the following (see fiddle):
var basestring = "GSMB Agency GmbH / Webdesign Agentur Ulm / Onlineshop Agentur / App Agentur Ulm, Germany";
var sixBitGroups = Encoding.UTF8.GetBytes(basestring)
.SelectMany(b => $"{Convert.ToString(b, 2).PadLeft(8,'0')}")
.Chunk(6)
.Select(c => new string(c.ToArray()));
string.Join("/", sixBitGroups).Dump();
You'll notice that it ends with ../01
. That 01
needs to be padded with 4 extra bits. Again, usually, they're zeroes, making it 010000
which is Q
. So you'll see your encoded string ends with ..FueQ==
. But when you realise that they don't have to be all zeroes, you'll see in the table that 01xxxx
covers everything from Q,R,S, .. i,j
. This explains why your base64 ..FueS==
still decode to the exact same string.
Upvotes: 3