Convert bytes to string implicitly in Pandas dataframe (Python 3.6)

Question

import pandas as pd
#Define Dataframe

d = {'cola': ['cola1', 'cola2', 'cola3', 'cola4', 'cola4']
    , 'colb': [b'colb1', b'colb2', b'colb3', b'colb4', b'colb4']
    , 'colc': ['colc1', 'colc2', 'colc3', 'colc4', 'colc4']
    , 'cold': [b'cold1', b'cold2', b'cold3', b'cold4', b'cold4']
  }
df = pd.DataFrame(data=d)

#Create flatfile from dataframe
df.to_csv('converted_file.txt', sep='|',index=False)

I would like to convert the bytes to strings i.e. remove 'b' prefix before creating the output file.

I tried the solution mentioned here: How to translate "bytes" objects into literal strings in pandas Dataframe, Python3.x?

str_df = df.select_dtypes([np.object])
str_df = str_df.stack().str.decode('utf-8').unstack()
for col in str_df:
    df[col] = str_df[col]

Although it works for columns [colb] and [cold], columns [cola] and [colc] are blank. This is mainly because all 4 columns are of type object. I am not sure how to select only columns [colb] and [cold] implicitly and then apply the decode function. These two columns need to be selected implicitly for decoding as the dataframe is created from an output of a SQL query.

Has anyone encountered this before and can suggest a solution?

Thanks in advance!

MaxU - stand with Ukraine · Accepted Answer

Demo:

In [12]: df
Out[12]:
    cola      colb   colc      cold
0  cola1  b'colb1'  colc1  b'cold1'
1  cola2  b'colb2'  colc2  b'cold2'
2  cola3  b'colb3'  colc3  b'cold3'
3  cola4  b'colb4'  colc4  b'cold4'
4  cola4  b'colb4'  colc4  b'cold4'

In [13]: df.applymap(lambda x: x.decode() if isinstance(x, bytes) else x)
Out[13]:
    cola   colb   colc   cold
0  cola1  colb1  colc1  cold1
1  cola2  colb2  colc2  cold2
2  cola3  colb3  colc3  cold3
3  cola4  colb4  colc4  cold4
4  cola4  colb4  colc4  cold4

Convert bytes to string implicitly in Pandas dataframe (Python 3.6)

Answers (1)

Related Questions