shuaf98
shuaf98

Reputation: 55

Problem Importing to MYSQL with Pandas: UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-7: character maps to <undefined>

I am trying to import multiple csv's in a dataframe at one time using pandas to_sql, to a MySQL database. After creating the engine, I am running the following:

folder_path = (file_path)
os.chdir(folder_path)
for file in os.listdir(folder_path):
    if '.csv' in file:
        df = pd.read_csv(file, low_memory = False)
        table_name = str(file.strip('.csv'))
        df.to_sql(table_name, con = engine, if_exists = 'replace')

However, when I run the code, I get the following error: "UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-7: character maps to "

Even when I try using the import wizard to uload that specific table the error is appearing on, it only imports 50 out of the 42,000 records.

Any help is appreciated!

Upvotes: 0

Views: 813

Answers (1)

shuaf98
shuaf98

Reputation: 55

I am not sure if this is the "correct" way of doing it, but I found a regex which selects only characters in UTF-8, removing the rest, for each field in the dataframe:

df.replace({r'[^\x00-\x7F]+':''}, regex=True, inplace=True)

Ideally though, I would like to keep the non UTF-8 characters, if there are any other solutions.

Upvotes: 1

Related Questions