Reputation: 1684
My CSV file looks like:
"Domain", "A"
rolexkings.ml,1
netmajic.com,1
northumbrianresort.info,2
дольщикиспб.рф,1
And to update it I am doing the following (working snippet, but not actual logic for brevity)
filename = 'file.csv'
tempfile = NamedTemporaryFile(mode='w', delete=False)
fields = ["Domain", "A"]
with open(filename, 'r', encoding='utf-8') as csvfile, tempfile:
reader = csv.DictReader(csvfile, fieldnames=fields)
writer = csv.DictWriter(tempfile, fieldnames=fields)
next(reader, None) # skip the headers
for row in tqdm(reader):
print(row['Domain'])
row = {'Domain': row['Domain'], 'A': row['A']}
writer.writerow(row)
shutil.move(tempfile.name, filename)
As soon as I encounter the non-latin domain, I am thrown:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-10: character maps to <undefined>
How can I fix it? Thanks!
Upvotes: 0
Views: 175
Reputation: 104752
You need to specify an encoding for tempfile
. It appears to be opening in ASCII mode (the charmap
codec) by default, which can't handle a Cyrillic string. You probably want to use utf-8
, since that's the encoding your input file is being read with.
You should probably also add newline=""
to both of your file opening calls, as that's expected by the csv
module, which handles "universal" newlines itself, rather than relying upon Python's normal support. This might not matter for your current data set on your current OS, but if you want your code to be general, it's a good idea.
Upvotes: 1