Reputation: 43
import pandas as pd
#Define Dataframe
d = {'cola': ['cola1', 'cola2', 'cola3', 'cola4', 'cola4']
, 'colb': [b'colb1', b'colb2', b'colb3', b'colb4', b'colb4']
, 'colc': ['colc1', 'colc2', 'colc3', 'colc4', 'colc4']
, 'cold': [b'cold1', b'cold2', b'cold3', b'cold4', b'cold4']
}
df = pd.DataFrame(data=d)
#Create flatfile from dataframe
df.to_csv('converted_file.txt', sep='|',index=False)
I would like to convert the bytes to strings i.e. remove 'b' prefix before creating the output file.
I tried the solution mentioned here: How to translate "bytes" objects into literal strings in pandas Dataframe, Python3.x?
str_df = df.select_dtypes([np.object])
str_df = str_df.stack().str.decode('utf-8').unstack()
for col in str_df:
df[col] = str_df[col]
Although it works for columns [colb] and [cold], columns [cola] and [colc] are blank. This is mainly because all 4 columns are of type object. I am not sure how to select only columns [colb] and [cold] implicitly and then apply the decode function. These two columns need to be selected implicitly for decoding as the dataframe is created from an output of a SQL query.
Has anyone encountered this before and can suggest a solution?
Thanks in advance!
Upvotes: 4
Views: 6559
Reputation: 210832
Demo:
In [12]: df
Out[12]:
cola colb colc cold
0 cola1 b'colb1' colc1 b'cold1'
1 cola2 b'colb2' colc2 b'cold2'
2 cola3 b'colb3' colc3 b'cold3'
3 cola4 b'colb4' colc4 b'cold4'
4 cola4 b'colb4' colc4 b'cold4'
In [13]: df.applymap(lambda x: x.decode() if isinstance(x, bytes) else x)
Out[13]:
cola colb colc cold
0 cola1 colb1 colc1 cold1
1 cola2 colb2 colc2 cold2
2 cola3 colb3 colc3 cold3
3 cola4 colb4 colc4 cold4
4 cola4 colb4 colc4 cold4
Upvotes: 11