Reputation: 55
I am trying to import multiple csv's in a dataframe at one time using pandas to_sql, to a MySQL database. After creating the engine, I am running the following:
folder_path = (file_path)
os.chdir(folder_path)
for file in os.listdir(folder_path):
if '.csv' in file:
df = pd.read_csv(file, low_memory = False)
table_name = str(file.strip('.csv'))
df.to_sql(table_name, con = engine, if_exists = 'replace')
However, when I run the code, I get the following error: "UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-7: character maps to "
Even when I try using the import wizard to uload that specific table the error is appearing on, it only imports 50 out of the 42,000 records.
Any help is appreciated!
Upvotes: 0
Views: 813
Reputation: 55
I am not sure if this is the "correct" way of doing it, but I found a regex which selects only characters in UTF-8, removing the rest, for each field in the dataframe:
df.replace({r'[^\x00-\x7F]+':''}, regex=True, inplace=True)
Ideally though, I would like to keep the non UTF-8 characters, if there are any other solutions.
Upvotes: 1