Reputation: 83
Hi I need help to get a base64 encoded column, what I got is a sha256 hashed column, I suppose to get 44 characters, but when I try this in python
[base64.b64encode(x.encode('utf-8')).decode() for x in xxx['yyy']]
it returns 88 character, anyone can help with this? Basically I want to achieve the steps showing in the pictures below in Python, thanks!
Upvotes: 4
Views: 22529
Reputation: 9509
This answer on the Cryptography Stack Exchange discusses why you're getting 64 characters. Basically, for historical reasons hashes are typically hex encoded even though this results in 64 characters while a base64 encoded hash would only be 44 characters. But if you need it base64 encoded there is a way of doing it. The following will give you a base64 encoded hash
from base64 import b64encode
from hashlib import sha256
email = '[email protected]'
email_as_bytes = email.encode('utf-8')
hash_as_bytes = b64encode(sha256(email_as_bytes).digest())
hash = hash_as_bytes.decode('utf-8')
Since b64encode and sha256 both operate on bytes, we can chain them together, and the resulting code isn't too terrible.
Upvotes: 0
Reputation: 31319
The step in the first image consist of a few substeps:
So:
from hashlib import sha256
s = '[email protected]'
h = sha256()
h.update(s.encode('utf-8')) # specifying encoding, optional as this is the default
hex_string = h.digest().hex()
print(hex_string)
The second image seems to suggest it takes that hex representation as text again, and base64 encodes it - but really it takes the byte string represented by the hex string and encodes that.
So, starting with the hex string:
from base64 import b64encode
digest_again = bytes.fromhex(hex_string)
b64bytes = b64encode(digest_again)
# no real need to specify 'ascii', the relevant code points overlap with UTF-8:
result = b64bytes.decode('ascii')
print(result)
Put together:
from hashlib import sha256
from base64 import b64encode
s = '[email protected]'
h = sha256()
h.update(s.encode())
print(h.digest().hex())
b64bytes = b64encode(h.digest())
print(b64bytes.decode())
Output:
b4c9a289323b21a01c3e940f150eb9b8c542587f1abfd8f0e1cc1ffc5e475514
tMmiiTI7IaAcPpQPFQ65uMVCWH8av9jw4cwf/F5HVRQ=
Why your code didn't work:
base64.b64encode('[email protected]'.encode('utf-8')).decode() # superfluous utf-8
This:
Nowhere does it apply SHA256 hashing, nor does it create a hex representation, if you were expecting that. The end result doesn't match because it is the text representation of the base64 encoding of the original text's UTF-8 encoding, not the digest of its SHA256 hash.
Or perhaps I misunderstood and you already had the hex encoding, but you're putting that in as a string:
x = 'b4c9a289323b21a01c3e940f150eb9b8c542587f1abfd8f0e1cc1ffc5e475514'
base64.b64encode(x.encode()).decode()
That does indeed result in a 88 character base64 encoding, because you're not encoding the bytes, you're encoding the hex representation. That would have to be this instead:
x = 'b4c9a289323b21a01c3e940f150eb9b8c542587f1abfd8f0e1cc1ffc5e475514'
base64.b64encode(bytes.fromhex(x)).decode()
... and perhaps that is the answer you were looking for.
Upvotes: 8