Reputation: 1745
I found some code online that I am trying to work through which encodes to base64. I know Python has base64.urlsafe_b64decode()
but I would like to learn a bit more about what is going on.
The JS atob
looks like:
function atob (input) {
var chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=';
var str = String(input).replace(/=+$/, '');
if (str.length % 4 == 1) {
throw new InvalidCharacterError("'atob' failed: The string to be decoded is not correctly encoded.");
}
for (
// initialize result and counters
var bc = 0, bs, buffer, idx = 0, output = '';
// get next character
buffer = str.charAt(idx++);
// character found in table? initialize bit storage and add its ascii value;
~buffer && (bs = bc % 4 ? bs * 64 + buffer : buffer,
// and if not first of each 4 characters,
// convert the first 8 bits to one ascii character
bc++ % 4) ? output += String.fromCharCode(255 & bs >> (-2 * bc & 6)) : 0
) {
// try to find character in table (0-63, not found => -1)
buffer = chars.indexOf(buffer);
}
return output;
}
My goal is to port this Python, but I am trying to understand what the for loop is doing in Javascript.
It checks if the value is located in the chars
table and then initializes some variables using a ternary like: bs = bc % 4 ? bs*64+buffer: buffer, bc++ %4
I am not quite sure I understand what the buffer, bc++ % 4
part of the ternary is doing. The comma confuses me a bit. Plus the String.fromCharCode(255 & (bs >> (-2 * bc & 6)))
is a bit esoteric to me.
I've been trying something like this in Python, which produces some results, albeit different than what the javascript implementation is doing
# Test subject
b64_str: str = "fwHzODWqgMH+NjBq02yeyQ=="
# Lookup table for characters
chars: str = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="
# Replace right padding with empty string
replaced = re.sub("=+$", '', b64_str)
if len(replaced) % 4 == 1:
raise ValueError("atob failed. The string to be decoded is not valid base64")
# Bit storage and counters
bc = 0
out: str = ''
for i in replaced:
# Get ascii value of character
buffer = ord(i)
# If counter is evenly divisible by 4, return buffer as is, else add the ascii value
bs = bc * 64 + buffer if bc % 4 else buffer
bc += 1 % 4 # Not sure I understand this part
# Check if character is in the chars table
if i in chars:
# Check if the bit storage and bit counter are non-zero
if bs and bc:
# If so, convert the first 8 bits to an ascii character
out += chr(255 & bs >> (-2 * bc & 6))
else:
out = 0
# Set buffer to the index of where the first instance of the character is in the b64 string
print(f"before: {chr(buffer)}")
buffer = chars.index(chr(buffer))
print(f"after: {buffer}")
print(out)
JS gives ó85ªÁþ60jÓlÉ
Python gives 2:u1(²ë:ð1G>%Y
Upvotes: -2
Views: 56
Reputation: 177685
Here is a tested version https://www.online-python.com/PiseKNFuaO
import base64
class InvalidCharacterError(Exception):
pass
def atob(input_str):
chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='
input_str = str(input_str).rstrip('=')
if len(input_str) % 4 == 1:
raise InvalidCharacterError("'atob' failed: The string to be decoded is not correctly encoded.")
output = []
bc = 0
bs = 0
buffer = 0
for char in input_str:
buffer = chars.find(char)
if buffer == -1:
raise InvalidCharacterError("'atob' failed: The string to be decoded contains an invalid character.")
bs = (bs << 6) + buffer
bc += 6
if bc >= 8:
bc -= 8
output.append(chr((bs >> bc) & 255))
return ''.join(output)
# Compare with Python's built-in Base64 decoding
def test_atob():
test_strings = [
"SGVsbG8gd29ybGQ=", # "Hello world"
"U29mdHdhcmUgRW5naW5lZXJpbmc=", # "Software Engineering"
"VGVzdGluZyAxMjM=", # "Testing 123"
"SGVsbG8gd29ybGQ==", # "Hello world" with extra padding
"SGVsbG8gd29ybGQ= ", # "Hello world" with trailing space (invalid)
"SGVsbG8gd29ybGQ\r\n", # "Hello world" with newline characters (invalid)
"Invalid!!==", # Invalid characters
"VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZyE", # "This is an encoded string!" without padding
"U29tZVNwZWNpYWwgQ2hhcnM6ICsgLyA=", # "SomeSpecial Chars: + / " with padding
]
for encoded in test_strings:
try:
expected = base64.b64decode(encoded).decode('utf-8')
result = atob(encoded)
print(result == expected, "Custom:", result, "Expected:", expected)
except Exception as e:
print(f"Error for string: {encoded} - {e}")
test_atob()
Upvotes: 1
Reputation: 36340
I've been trying something like this in Python, which produces some results, albeit different than what the javascript implementation is doing
First step would be determining if either implementation works right, RFC4648 contains Tests Vectors for that purpose
BASE64("") = ""
BASE64("f") = "Zg=="
BASE64("fo") = "Zm8="
BASE64("foo") = "Zm9v"
BASE64("foob") = "Zm9vYg=="
BASE64("fooba") = "Zm9vYmE="
BASE64("foobar") = "Zm9vYmFy"
If one implementation works correctly you should determine what is causing difference, otherwise you might attempt to implement base64decode based on description contained in mentioned RFC4648.
Upvotes: 0