Reputation: 349
I am trying to load datasets from FDIC. Every quarter FDIC releases a zip file that contains around 62 csv files with names like the following:
All_Reports_20080331_Assets and Liabilities.cvs,
All_Reports_20080331_Bank Assets Sold and Securitized.csv,
etc
I have downloaded the all the files in a directory like the following:
C:\projects\FDIC\All_Reports_20080331
Once there are many zip files, from different quarters available, I am starting to prepare a structure for a loop that will run over many paths (each one representing a quarter with 62 csv files inside). Before getting into the loop, however, the upload is not working due to some utf_8 error.
import pandas as pd
path = r"C:\projects\FDIC\All_Reports_20080331"
file = r"\All_Reports_20080331_Assets and Liabilities.csv"
df_assets_&_liab = pd.read_csv(path+file)
gives me the following error:
'utf-8' codec can't decode byte 0xfc in position 5: invalid start byte
I tried to use a parameter in pandas.read_csv to "utf_8" but error message is the same.
Any idea on how to better load those files via panda? Thanks a lot!
ps: the forder with the 62 csv files can be found here: FDIC Website
Upvotes: 0
Views: 1094
Reputation: 6166
First look at the encoding format of the file.
import chardet
with open(path+file,"rb") as f:
data = f.read()
print(chardet.detect(data))
{'encoding': 'ISO-8859-1', 'confidence': 0.73, 'language': ''}
Then
df_assets_&_liab = pd.read_csv(path+file,encoding='ISO-8859-1')
Upvotes: 1