Qing
Qing

Reputation: 83

How to base64 encode a SHA256 hex character string

Hi I need help to get a base64 encoded column, what I got is a sha256 hashed column, I suppose to get 44 characters, but when I try this in python

[base64.b64encode(x.encode('utf-8')).decode() for x in xxx['yyy']]

it returns 88 character, anyone can help with this? Basically I want to achieve the steps showing in the pictures below in Python, thanks! enter image description here

enter image description here

enter image description here

Upvotes: 4

Views: 22529

Answers (2)

hostingutilities.com
hostingutilities.com

Reputation: 9509

This answer on the Cryptography Stack Exchange discusses why you're getting 64 characters. Basically, for historical reasons hashes are typically hex encoded even though this results in 64 characters while a base64 encoded hash would only be 44 characters. But if you need it base64 encoded there is a way of doing it. The following will give you a base64 encoded hash

from base64 import b64encode
from hashlib import sha256

email = '[email protected]'
email_as_bytes = email.encode('utf-8')
hash_as_bytes = b64encode(sha256(email_as_bytes).digest())
hash = hash_as_bytes.decode('utf-8')

Since b64encode and sha256 both operate on bytes, we can chain them together, and the resulting code isn't too terrible.

Upvotes: 0

Grismar
Grismar

Reputation: 31319

The step in the first image consist of a few substeps:

  • a text is entered, but that is just the character representation of a UTF-8 encoding
  • sha256 hashing is applied to that bytes string
  • the resulting digest byte sequence is rendered in its hexadecimal representation

So:

from hashlib import sha256

s = '[email protected]'

h = sha256()
h.update(s.encode('utf-8'))  # specifying encoding, optional as this is the default
hex_string = h.digest().hex()
print(hex_string)

The second image seems to suggest it takes that hex representation as text again, and base64 encodes it - but really it takes the byte string represented by the hex string and encodes that.

So, starting with the hex string:

  • decode the hex to bytes (reconstructing the digest bytes)
  • encode the bytes using base64 into an ascii bytes string
  • decode that resulting bytes string into characters for printing
from base64 import b64encode

digest_again = bytes.fromhex(hex_string)
b64bytes = b64encode(digest_again)
# no real need to specify 'ascii', the relevant code points overlap with UTF-8:
result = b64bytes.decode('ascii')
print(result)

Put together:

from hashlib import sha256
from base64 import b64encode

s = '[email protected]'

h = sha256()
h.update(s.encode())
print(h.digest().hex())

b64bytes = b64encode(h.digest())
print(b64bytes.decode())

Output:

b4c9a289323b21a01c3e940f150eb9b8c542587f1abfd8f0e1cc1ffc5e475514
tMmiiTI7IaAcPpQPFQ65uMVCWH8av9jw4cwf/F5HVRQ=

Why your code didn't work:

base64.b64encode('[email protected]'.encode('utf-8')).decode()  # superfluous utf-8

This:

  • encodes the characters '[email protected]' into bytes using UTF-8
  • encodes that byte string using base64
  • decodes the resulting byte string into a character string

Nowhere does it apply SHA256 hashing, nor does it create a hex representation, if you were expecting that. The end result doesn't match because it is the text representation of the base64 encoding of the original text's UTF-8 encoding, not the digest of its SHA256 hash.

Or perhaps I misunderstood and you already had the hex encoding, but you're putting that in as a string:

x = 'b4c9a289323b21a01c3e940f150eb9b8c542587f1abfd8f0e1cc1ffc5e475514'
base64.b64encode(x.encode()).decode()

That does indeed result in a 88 character base64 encoding, because you're not encoding the bytes, you're encoding the hex representation. That would have to be this instead:

x = 'b4c9a289323b21a01c3e940f150eb9b8c542587f1abfd8f0e1cc1ffc5e475514'
base64.b64encode(bytes.fromhex(x)).decode()

... and perhaps that is the answer you were looking for.

Upvotes: 8

Related Questions