Reputation: 850
I've got a file with some utf-8 characters. Loaded it into a dataframe (with encoding explicitly set to utf-8) and now trying to write it out to a csv. I keep getting a UnicodeEncodeError, and I'm not sure why. I've set encoding='utf-8'
(also tried encoding='utf8'
) and I still get it, with reference to 'ascii' codec.
One clue is I don't get the issue when testing this on a Windows machine, but I do get it on an Ubuntu machine.
I've tried upgrading pandas from 0.25 -> 1.0 and it makes no difference.
Note also that this is being used within Django.
df.to_csv(f, index=False, line_terminator='\\n', encoding='utf-8')
File "/home/webapp/.virtualenvs/django/lib/python3.6/site-packages/pandas/core/generic.py", line 3203, in to_csv
formatter.save()
File "/home/webapp/.virtualenvs/django/lib/python3.6/site-packages/pandas/io/formats/csvs.py", line 204, in save
self._save()
File "/home/webapp/.virtualenvs/django/lib/python3.6/site-packages/pandas/io/formats/csvs.py", line 323, in _save
self._save_chunk(start_i, end_i)
File "/home/webapp/.virtualenvs/django/lib/python3.6/site-packages/pandas/io/formats/csvs.py", line 354, in _save_chunk
libwriters.write_csv_rows(self.data, ix, self.nlevels, self.cols, self.writer)
File "pandas/_libs/writers.pyx", line 65, in pandas._libs.writers.write_csv_rows
UnicodeEncodeError: 'ascii' codec can't encode character '\\xc7' in position 67: ordinal not in range(128)
Upvotes: 0
Views: 329
Reputation: 850
Part of the problem, it turns out, stems from an issue with Apache/mod_wsgi where it defaults to ANSI preferred encoding. I found a good discussion and solution here. The fix involves setting the system's default encoding to UTF-8, a la something like this in the Apache conf file:
WSGIDaemonProcess my-django-site lang='en_US.UTF-8' locale='en_US.UTF-8'
I'm still not sure why the encoding parameter of to_csv() doesn't override the defaults.
Upvotes: 1