yusuf
yusuf

Reputation: 3781

converting characters from non "utf-8" characterized file to english equivilances in python

I have such lines in my file:

M  Aad                                  4                                             $
M  Aadam                                          1                                   $
F  Aadje                                1                                             $
M  Ådne                      +                 1                                      $

When I run the following code;

#!/usr/bin/python
# -*- coding: utf-8 -*-

import csv, unicodedata, urllib
from unidecode import unidecode
from textblob import TextBlob

with open('names.csv', 'rb') as f:
    reader = csv.reader(f)
    my_list = list(reader)

for a in range(len(my_list)):
        name = my_list[a][0]
        name = unicode(name,'ISO-8859-15')
        print name

I get such output on some lines:

F  <Z^>ydr<edeg>                                      1                                 $

There are many similar issues on stackoverflow for this case, but their solutions didn't fit to my problem.

How can I fix this problem?

Upvotes: 0

Views: 55

Answers (1)

Joachim Sauer
Joachim Sauer

Reputation: 308001

It sounds like your input is not actually UTF-8, it seems to be ISO-8859-* (possibly ISO-8859-15 or ISO-8859-1), 0xC5 is the ISO encoding of Å (the UTF-8 encoding would be 0xC3 0xA5).

Upvotes: 2

Related Questions