Reputation: 833
I have a csv with ~10 columns.. One of the columns has information in bytes i.e., b'gAAAA234'
. But when I read this from pandas via .read_csv("file.csv")
, I get it all in a dataframe and this particular column is in string rather than bytes i.e., b'gAAAA234'
.
How do I simply read it as bytes without having to read it as string and then reconverting?
Currently, I'm working with this:
b = df['column_with_data_in_bytes'][i]
bb = bytes(b[2:len(b)-1],'utf-8')
#further processing of bytes
This works but I was hoping to find a more elegant/pythonic or more reliable way to do this?
Upvotes: 1
Views: 1388
Reputation: 402593
You might consider parsing with ast.literal_eval
:
import ast
df['column_with_data_in_bytes'] = df['column_with_data_in_bytes'].apply(ast.literal_eval)
Demo:
In [322]: df = pd.DataFrame({'Col' : ["b'asdfghj'", "b'ssdgdfgfv'", "b'asdsfg'"]})
In [325]: df
Out[325]:
Col
0 b'asdfghj'
1 b'ssdgdfgfv'
2 b'asdsfg'
In [326]: df.Col.apply(ast.literal_eval)
Out[326]:
0 asdfghj
1 ssdgdfgfv
2 asdsfg
Name: Col, dtype: object
Upvotes: 3