Reputation: 825
I would appreciate if someone could please explain this to me.
I came across this post (not important just reference) and saw a token encoded with base64 where the guy decoded it.
EYl0htUzhivYzcIo+zrIyEFQUE1PQkk= -> t3+(:APPMOBI
I then tried to encode t3+(:APPMOBI again using base64 to see if I would get the same result, but was very surprised to get:
t3+(:APPMOBI - > dDMrKDpBUFBNT0JJ
Completly different token.
I then tried to decode the original token EYl0htUzhivYzcIo+zrIyEFQUE1PQkk=
and got t3+(:APPMOBI with random characters between it. (I got ◄ëtå╒3å+╪═┬(√:╚╚APPMOBI
could be wrong, I quickly did it off the top off my head)
What is the reason for the difference in tokens were they not supposed to be the same?
Upvotes: 1
Views: 1107
Reputation: 4391
The problem is in the encoding that displayed the output to you, or in the encoding that you used to input the data to base64. This is actually the problem that base64 encoding was invented to help solve.
Instead of trying to copy and paste the non-ASCII characters, save the output as a binary file, then examine it. Then, encode the binary file. You'll see the same base64 string.
c:\TEMP>type b.txt
EYl0htUzhivYzcIo+zrIyEFQUE1PQkk=
c:\TEMP>base64 -d b.txt > b.bin
c:\TEMP>od -t x1 b.bin
0000000 11 89 74 86 d5 33 86 2b d8 cd c2 28 fb 3a c8 c8
0000020 41 50 50 4d 4f 42 49
c:\TEMP>base64 -e b.bin
EYl0htUzhivYzcIo+zrIyEFQUE1PQkk=
od is a tool (octal dump) that outputs binary data using hexadecimal notation, and shows each of the bytes.
EDIT:
You asked about a different string in your comments, dDMrKDpBUFBNT0JJ, and why does that decode to the same thing? Well, it doesn't decode to the same thing. It decodes to this string of bytes: 74 33 2b 28 3a 41 50 50 4d 4f 42 49. Your original string decoded to this string of bytes: 11 89 74 86 d5 33 86 2b d8 cd c2 28 fb 3a c8 c8 41 50 50 4d 4f 42 49.
Notice the differences: your original string decoded to 23 bytes, your second string decoded to only 12 bytes. The original string included non-ASCII bytes like 11, d5, d8, cd, c2, fb, c8, c8. These bytes don't print the same way on every system. You referred to them as "random bytes", but they're not. They're part of the data, and base64 is designed to make sure they can be transmitted.
I think to understand why these strings are different, you need to first understand the nature of character data, what base64 is, and why it exists. Remember that computers work only on numbers, but people need to work with familiar concepts like letters and digits. So ASCII was created as an "encoding" standard that represents a little number (we call this little number a "byte") as a letter or a digit, so that we humans can read it. If we line up a group of bytes, we can spell out a message. 41 50 50 4d 4f 42 49 are the bytes that represent the word APPMOBI. We call a group of bytes like this a "string".
Every letter from A-Z and every digit from 0-9 has a number specified in ASCII that represents it. But there are many extra numbers that are not in the standard, and not all of those represent visible or sensible letters or digits. We say they're non-printable. Your longer message includes many bytes that aren't printable (you called them random.)
When a computer program like email is dealing with a string, if the bytes are printable ASCII characters, it's easy. The email program knows what to do with them. But if your bytes instead represent a picture, the bytes could have values that aren't ASCII, and various email programs won't know what to do with them. Base64 was created to take all kinds of bytes, both printable and non-printable bytes, and translate them into a string of bytes representing only printable letters. Because they're all printable, a program like email or a web server can easily handle them, even if it doesn't know that they actually contain a picture.
Here's the decode of your new string:
c:\TEMP>type c.txt
dDMrKDpBUFBNT0JJ
c:\TEMP>base64 -d c.txt
t3+(:APPMOBI
c:\TEMP>base64 -d c.txt > c.bin
c:\TEMP>od -t x1 c.bin
0000000 74 33 2b 28 3a 41 50 50 4d 4f 42 49
0000014
c:\TEMP>type c.bin
t3+(:APPMOBI
c:\TEMP>
Upvotes: 2
Reputation: 136
The whole purpose of base64 encoding is to encode binary data into text representation so that they can be transmitted over the network or displayed without corruption. But it ironically happened with the original post you were referring to,
EYl0htUzhivYzcIo+zrIyEFQUE1PQkk= does NOT decode to t3+(:APPMOBI
instead, it contains some binary bytes(not random btw) that you correctly showed. So the problem was due to the original post where either the author or the tool/browser that she/he used "cleaned up", or rather corrupted the decoded binary data.
There is always one-to-one relationship between encoded and decoded data (provided the same "base" is used, i.e. the same set of characters are used for encoded text.)
t3+(:APPMOBI indeed will be encoded into dDMrKDpBUFBNT0JJ
Upvotes: 2