Jishan
Jishan

Reputation: 1684

DictWriter throws UnicodeEncodeError

My CSV file looks like:

"Domain", "A"
rolexkings.ml,1
netmajic.com,1
northumbrianresort.info,2
дольщикиспб.рф,1

And to update it I am doing the following (working snippet, but not actual logic for brevity)

filename = 'file.csv'
tempfile = NamedTemporaryFile(mode='w', delete=False)
fields = ["Domain", "A"]

with open(filename, 'r', encoding='utf-8') as csvfile, tempfile:

    reader = csv.DictReader(csvfile, fieldnames=fields)
    writer = csv.DictWriter(tempfile, fieldnames=fields)

    next(reader, None)  # skip the headers

    for row in tqdm(reader):
        print(row['Domain'])
        row = {'Domain': row['Domain'], 'A': row['A']}
        writer.writerow(row)

shutil.move(tempfile.name, filename)

As soon as I encounter the non-latin domain, I am thrown:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-10: character maps to <undefined>

How can I fix it? Thanks!

Upvotes: 0

Views: 175

Answers (1)

Blckknght
Blckknght

Reputation: 104752

You need to specify an encoding for tempfile. It appears to be opening in ASCII mode (the charmap codec) by default, which can't handle a Cyrillic string. You probably want to use utf-8, since that's the encoding your input file is being read with.

You should probably also add newline="" to both of your file opening calls, as that's expected by the csv module, which handles "universal" newlines itself, rather than relying upon Python's normal support. This might not matter for your current data set on your current OS, but if you want your code to be general, it's a good idea.

Upvotes: 1

Related Questions