What makes repeated base64 and base36 encoding so slow? and how to make it faster?

Question

I was trying out some different encodings and as I tried to encode some text repeatedly in base64/base32 (which one is used depends on a semi-random boolean list) I noticed that it was ridiculously slow which I didn't understand because I thought that they were particularly fast. I can't really figure out why it's so slow, it'd be cool if you could help me.

This is the part of the code concerned :

from base64 import b64encode, b32encode
from random import random as rn

big_number = int(input("The number of encoding layers : "))
bool_list = [True if rn() < 0.5 else False for _ in range(big_number)]
sample_text = bytes("lorem ipsum", "utf8")
for curr_bool in bool_list:
    temp = b64encode(sample_text) if curr_bool else b32encode(sample_text)
    sample_text = temp

JosefZ · Accepted Answer

Memory and time expensive operations. The answer is based on pivotal comment (by Wups):

If you encode bytes with base64, the result is longer than the input. If you take this result and encode it again in a loop, you have exponential growth.

The following modified script shows growing ratio for base64 and base32 encodings:

from base64 import b64encode, b32encode
from random import random as rn
ii = 0
big_number = int(input("The number of encoding layers : "))
30
bool_list = [True if rn() < 0.5 else False for _ in range(big_number)]
sample_text = bytes("lorem ipsum", "utf8")
sample_len = len( sample_text )
current_len = sample_len
for curr_bool in bool_list:
    sample_text = b64encode(sample_text) if curr_bool else b32encode(sample_text)
    print( ii, curr_bool, len(sample_text), (len( sample_text )/current_len))
    current_len = len( sample_text )
    ii += 1

** Sample output** (truncated): python .\SO\71009943.py

The number of encoding layers : 30
…
24 True 172320 1.3333333333333333
25 False 275712 1.6
26 True 367616 1.3333333333333333
27 False 588192 1.600017409470752
28 True 784256 1.3333333333333333
29 False 1254816 1.6000081606006202

So resultant length is roughly somewhere between Compound interest for base64's 4/3 and base32's 1.6:

big_number = 30
round( sample_len * (4/3) ** big_number)
# 61596
round( sample_len * 1.6 ** big_number)
# 14621508

and for a bigger numbers:

big_number = 50
round( sample_len * (4/3) ** big_number)
# 19423591
round( sample_len * 1.6 ** big_number)
# 176763184868

and

big_number = 99
round( sample_len * (4/3) ** big_number)
# 25723354884215
round( sample_len * 1.6 ** big_number)
# 1775296791184759324672

What makes repeated base64 and base36 encoding so slow? and how to make it faster?

Answers (1)

Related Questions