Reputation: 13
I have exported a comma separated value file from a MSQL database (rpt-file ending). It only has two columns and 8 rows. Looking at the file in notepad everything looks OK. I tried to load the data into a pandas data frame using the code below:
import pandas as pd
with open('file.csv', 'r') as csvfile:
df_data = pd.read_csv(csvfile, sep=',' , encoding = 'utf-8')
print(df_data)
When printing to console the first column header name is wrong with some extra characters,  , at the start of column 1. I get no errors but obviously the first column is decoded wrongly in my code:Image of output
Anyone have any ideas on how to get this right?
Upvotes: 1
Views: 2907
Reputation: 402463
Here's one possible option: Fix those headers after loading them in:
df.columns = [x.encode('utf-8').decode('ascii', 'ignore') for x in df.columns]
The str.encode
followed by the str.decode
call will drop those special characters, leaving only the ones in ASCII range behind:
>>> 'aSA'.encode('utf-8').decode('ascii', 'ignore')
'aSA'
Upvotes: 3