DictWriter throws UnicodeEncodeError

Question

My CSV file looks like:

"Domain", "A"
rolexkings.ml,1
netmajic.com,1
northumbrianresort.info,2
дольщикиспб.рф,1

And to update it I am doing the following (working snippet, but not actual logic for brevity)

filename = 'file.csv'
tempfile = NamedTemporaryFile(mode='w', delete=False)
fields = ["Domain", "A"]

with open(filename, 'r', encoding='utf-8') as csvfile, tempfile:

    reader = csv.DictReader(csvfile, fieldnames=fields)
    writer = csv.DictWriter(tempfile, fieldnames=fields)

    next(reader, None)  # skip the headers

    for row in tqdm(reader):
        print(row['Domain'])
        row = {'Domain': row['Domain'], 'A': row['A']}
        writer.writerow(row)

shutil.move(tempfile.name, filename)

As soon as I encounter the non-latin domain, I am thrown:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-10: character maps to

How can I fix it? Thanks!

Blckknght · Accepted Answer

You need to specify an encoding for tempfile. It appears to be opening in ASCII mode (the charmap codec) by default, which can't handle a Cyrillic string. You probably want to use utf-8, since that's the encoding your input file is being read with.

You should probably also add newline="" to both of your file opening calls, as that's expected by the csv module, which handles "universal" newlines itself, rather than relying upon Python's normal support. This might not matter for your current data set on your current OS, but if you want your code to be general, it's a good idea.

DictWriter throws UnicodeEncodeError

Answers (1)

Related Questions