Reputation: 223
I am trying to import the data and facing issue with the encoding.
Some times utf-8 works but not latin-1, similarly it depends on the type of data coming.
Different encoding used - latin-1, utf-8, windows-1252
Code -
pd.read_csv(dir_in+file_note,sep='|',
low_memory=False,header=0,
error_bad_lines=False,
encoding = "windows-1252",
warn_bad_lines=False)
Please guide on how to make the code dynamic so that if one gives error it should try the other one.
1) Priority one will be utf-8
2) Priority two will be latin-1
3) Priority three will be windows-1252
Upvotes: 0
Views: 151
Reputation: 1836
The question was asked on the github dev page:
Is there a way pandas can read a csv file and find out the encoding automatically . Or is there a fix to this? Maybe to be a feature (if it does not exist yet) in a future release?
One contributor says:
This seems out of scope for pandas. I'd recommend using a library like chardet to determine the encoding ahead of time.
Upvotes: 0
Reputation: 700
Not very beautiful, but it works.
try:
pd.read_csv(dir_in+file_note,sep='|',
low_memory=False,header=0,
error_bad_lines=False,
encoding = "utf-8",
warn_bad_lines=False)
except:
try:
pd.read_csv(dir_in+file_note,sep='|',
low_memory=False,header=0,
error_bad_lines=False,
encoding = "latin-1",
warn_bad_lines=False)
except:
pd.read_csv(dir_in+file_note,sep='|',
low_memory=False,header=0,
error_bad_lines=False,
encoding = "windows-1252",
warn_bad_lines=False)
Upvotes: 1