Dr.Chuck
Dr.Chuck

Reputation: 223

Text data encoding error for csv format in Python

I am trying to import the data and facing issue with the encoding.

Some times utf-8 works but not latin-1, similarly it depends on the type of data coming.

Different encoding used - latin-1, utf-8, windows-1252

Code -

pd.read_csv(dir_in+file_note,sep='|',
              low_memory=False,header=0,
              error_bad_lines=False,
              encoding = "windows-1252",
              warn_bad_lines=False)

Please guide on how to make the code dynamic so that if one gives error it should try the other one.

1) Priority one will be utf-8

2) Priority two will be latin-1

3) Priority three will be windows-1252

Upvotes: 0

Views: 151

Answers (2)

Kate
Kate

Reputation: 1836

The question was asked on the github dev page:

Is there a way pandas can read a csv file and find out the encoding automatically . Or is there a fix to this? Maybe to be a feature (if it does not exist yet) in a future release?

One contributor says:

This seems out of scope for pandas. I'd recommend using a library like chardet to determine the encoding ahead of time.

Upvotes: 0

incarnadine
incarnadine

Reputation: 700

Not very beautiful, but it works.

try:
  pd.read_csv(dir_in+file_note,sep='|',
                low_memory=False,header=0,
                error_bad_lines=False,
                encoding = "utf-8",
                warn_bad_lines=False)
except:
  try:
    pd.read_csv(dir_in+file_note,sep='|',
                  low_memory=False,header=0,
                  error_bad_lines=False,
                  encoding = "latin-1",
                  warn_bad_lines=False)
  except:
    pd.read_csv(dir_in+file_note,sep='|',
                  low_memory=False,header=0,
                  error_bad_lines=False,
                  encoding = "windows-1252",
                  warn_bad_lines=False)

Upvotes: 1

Related Questions