Luluperam
Luluperam

Reputation: 53

python script not encoding to utf-8

I have this Python 3 script to read a json file and save as csv. It works fine except for the special characters like \u00e9. So Montr\u00e9al should be encoded like Montréal, but it is giving me Montréal instead.

import json

ifilename = 'business.json'
ofilename = 'business.csv'

json_lines = [json.loads( l.strip() ) for l in open(ifilename).readlines() ]
OUT_FILE = open(ofilename, "w", newline='', encoding='utf-8')
root = csv.writer(OUT_FILE)
root.writerow(["business_id","name","neighborhood","address","city","state"])
json_no = 0
for l in json_lines:
    root.writerow([l["business_id"],l["name"],l["neighborhood"],l["address"],l["city"],l["state"]])
    json_no += 1

print('Finished {0} lines'.format(json_no))
OUT_FILE.close()

Upvotes: 0

Views: 733

Answers (2)

Luluperam
Luluperam

Reputation: 53

It turns out the csv file was displaying correctly when opening it with Notepad++ but not with Excel. So I had to import the csv file with Excel and specify 65001: Unicode (UTF-8). Thanks for the help.

Upvotes: 1

Carlos Rojas
Carlos Rojas

Reputation: 354

Try using this at the top of the file

# -*- coding: utf-8 -*-

Consider this example:

# -*- coding: utf-8 -*-    
import sys

print("my default encoding is : {0}".format(sys.getdefaultencoding()))
string_demo="Montréal"
print(string_demo)

reload(sys) # just in python2.x
sys.setdefaultencoding('UTF8') # just in python2.x

print("my default encoding is : {0}".format(sys.getdefaultencoding()))
print(str(string_demo.encode('utf8')), type(string_demo.encode('utf8')))

In my case, the output is like this if i run in python2.x:

my default encoding is : ascii
Montréal
my default encoding is : UTF8
('Montr\xc3\xa9al', <type 'str'>)

but when i comment out the reload and setdefaultencoding lines, my output is like this:

my default encoding is : ascii
Montréal
my default encoding is : ascii
Traceback (most recent call last):
  File "test.py", line 12, in <module>
    print(str(string_demo.encode('utf8')), type(string_demo.encode('utf8')))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5: ordinal not in range(128)

It's most a problem with the editor, Python when it's a encode error raise a Exception.

Upvotes: 0

Related Questions