Reputation: 1393
So, I have a pandas dataframe which is classified as 'object', but is actually base64 encoded payload, which I want to convert into hex.
raw
AIgIm/H/SfwAR2IBAgMgAgIAAQMCAFoAcQAAAAAAAAFxAAAAAAAAA4gAAABiAF8AABI=
AIgIm/v/SfsAUNwBAgMgAgIAEgMCAEIAcQAAAAAAAAFxAAAAAAAAA4gAAABkAF8AAAw=
AIgIm/z/Sg4AVroBAgMgAgIA6wMCAFgAcQAAAAAAAAFxAAAAAAAAA4geAAFEADoAGQs=
Using https://cryptii.com/base64-to-hex I get these values (which is what I expect):
new_raw
00 88 08 9b f1 ff 49 fc 00 47 62 01 02 03 20 02 02 00 01 03 02 00 5a 00 71 00 00 00 00 00 00 01 71 00 00 00 00 00 00 03 88 00 00 00 62 00 5f 00 00 12
00 88 08 9b fb ff 49 fb 00 50 dc 01 02 03 20 02 02 00 12 03 02 00 42 00 71 00 00 00 00 00 00 01 71 00 00 00 00 00 00 03 88 00 00 00 64 00 5f 00 00 0c
00 88 08 9b fc ff 4a 0e 00 56 ba 01 02 03 20 02 02 00 eb 03 02 00 58 00 71 00 00 00 00 00 00 01 71 00 00 00 00 00 00 03 88 1e 00 01 44 00 3a 00 19 0b
Based on similar questions previously asked, I've tried:
df['new_raw'] = df['raw'].apply(lambda x: x.decode("base64").encode("hex"))
But this gives me:
AttributeError: 'str' object has no attribute 'decode'` error.
Upvotes: 2
Views: 2599
Reputation: 1123590
You have Python 3 string objects, which do not have a .decode()
method. Decoding is something you do to bytes values to get strings, strings you would encode. It appears you found some Python 2-specific code to do the conversion which is not compatible.
To convert Base64 data to binary, then on to hex, use the base64
module then call the .hex()
method:
import base64
df['raw'].apply(lambda b: base64.b64decode(b).hex())
Demo:
>>> import pandas as pd
>>> import base64
>>> df = pd.DataFrame({'raw': '''\
... AIgIm/H/SfwAR2IBAgMgAgIAAQMCAFoAcQAAAAAAAAFxAAAAAAAAA4gAAABiAF8AABI=
... AIgIm/v/SfsAUNwBAgMgAgIAEgMCAEIAcQAAAAAAAAFxAAAAAAAAA4gAAABkAF8AAAw=
... AIgIm/z/Sg4AVroBAgMgAgIA6wMCAFgAcQAAAAAAAAFxAAAAAAAAA4geAAFEADoAGQs=
... '''}, index=[0, 1, 2])
>>> df['raw'].apply(lambda b: base64.b64decode(b).hex())
0 0088089bf1ff49fc00476201020320020200010302005a...
1 0088089bf1ff49fc00476201020320020200010302005a...
2 0088089bf1ff49fc00476201020320020200010302005a...
Name: raw, dtype: object
Upvotes: 2