Reputation: 4867
I'm writing a little script that allows me to import my Facebook contacts' email addresses to GMail/Android. My input file has unicode characters, like: Jasmin L\u00f3pez. The generated CSV output file looks like this:
Andr\u00e9 Zzz,,,,,,,,,,,,,,,,,,,,,,,,,,fbcontacts ::: * My Contacts,* Home,[email protected]
Andr\u00e9ia Ggg,,,,,,,,,,,,,,,,,,,,,,,,,,fbcontacts ::: * My Contacts,* Home,[email protected]
Andr\u00e9s Bbb,,,,,,,,,,,,,,,,,,,,,,,,,,fbcontacts ::: * My Contacts,* Home,[email protected]
As you can see I have problems with encodings. I'm creating a Google contacts CSV file but I need names properly displayed. I'm using this function to write the CSV:
def writecsv(self):
if self.outfile is not '':
#fh = open(self.outfile, 'wb')
#fh = codecs.open(self.outfile, "wb", "utf-8")
fh = codecs.open(self.outfile, 'wb', encoding="latin-1")
else:
fh = sys.stdout
csvhdlr = csv.writer(fh, quotechar='"', quoting=csv.QUOTE_MINIMAL)
csvhdlr.writerow("Name,Given Name,Additional Name,Family Name,Yomi Name,Given Name Yomi,Additional Name Yomi,Family Name Yomi,Name Prefix,Name Suffix,Initials,Nickname,Short Name,Maiden Name,Birthday,Gender,Location,Billing Information,Directory Server,Mileage,Occupation,Hobby,Sensitivity,Priority,Subject,Notes,Group Membership,E-mail 1 - Type,E-mail 1 - Value".split(','))
for contact in self.clist:
#csvhdlr.writerow(dict((vname, vtype, vnotes, vstereotype, vauthor, valias, vgenfile.encode('utf-8')) for vname, vtype, vnotes, vstereotype, vauthor, valias, vgenfile in row.iteritems()))
row = contact.fullname + ',,,,,,,,,,,,,,,,,,,,,,,,,,fbcontacts ::: * My Contacts,* Home,' + contact.email
csvhdlr.writerow(row.split(','))
Any idea please? I'm quite new to python and everytime I have to use encodings, it doesn't work as I would like to =(
Thanks a lot for your help!
Upvotes: 3
Views: 2811
Reputation: 251355
If I understand you right, your file doesn't contain high unicode characters; it just contains unicode escape sequences like "\u00f3" that represent high unicode characters. If your file actually contains the string "Jasmin L\u00f3pez" (with a literal backslash and u) then you'll need to decode that to actual unicode characters before writing it. Take a look at the unicode_escape
codec.
>>> x = b"\u00f3"
>>> print x
\u00f3
>>> print x.decode('unicode_escape')
ó
Upvotes: 3