Reputation: 3781
I have such lines in my file:
M Aad 4 $
M Aadam 1 $
F Aadje 1 $
M Ådne + 1 $
When I run the following code;
#!/usr/bin/python
# -*- coding: utf-8 -*-
import csv, unicodedata, urllib
from unidecode import unidecode
from textblob import TextBlob
with open('names.csv', 'rb') as f:
reader = csv.reader(f)
my_list = list(reader)
for a in range(len(my_list)):
name = my_list[a][0]
name = unicode(name,'ISO-8859-15')
print name
I get such output on some lines:
F <Z^>ydr<edeg> 1 $
There are many similar issues on stackoverflow for this case, but their solutions didn't fit to my problem.
How can I fix this problem?
Upvotes: 0
Views: 55
Reputation: 308001
It sounds like your input is not actually UTF-8, it seems to be ISO-8859-* (possibly ISO-8859-15 or ISO-8859-1), 0xC5 is the ISO encoding of Å (the UTF-8 encoding would be 0xC3 0xA5).
Upvotes: 2