Reputation: 4266
I'm trying to print some data to a csv file but unicode is killing my vibe.
My data is in dictionary format - a snippet here:
{'category': u'Best food blog written by a linguist\xa0', 'runners_up': [], 'winner': [u'shesimmers.com'], 'category_url': 'http://www.chicagoreader.com/chicago/best-food-blog-written-by-a-linguist/BestOf?oid=4101663'}
and this the segment of my code where I'm employing the DictWriter method.
data = utf_8_encoder(data)
with open('best_food_n_drink.csv', 'w') as csvfile:
categories = ['category', 'category_url', 'winner', 'runners_up']
writer = csv.DictWriter(csvfile, delimiter =',', fieldnames=categories)
writer.writeheader()
for row in data:
writer.writerow(row)
utf_8_encoder is from a function I defined earlier:
def utf_8_encoder(unicode_csv_data):
for line in unicode_csv_data:
line.encode('utf-8')
return unicode_csv_data
I keep getting error messages like 'dict' object has no attribute 'encode'
. I've tried doing something along the lines of forgoing the encoder function and substituting row.values().encode('utf-8')
in the for loop at the bottom, but that just tells me `list object has no attribute 'encode'.
I've tried substituting ('utf-8')
with ('ascii', 'ignore')
as well but just can't figure it out.
Upvotes: 1
Views: 14382
Reputation: 3279
Yet another solution is to create comprehensive methods that will check for further types beyond just unicode
and list
, I know in the original questions isn't but anyone can get here trying to convert a complex dict
(with inner dicts, lists...), so here is my contribution:
def array_to_utf(a):
autf = []
i = 0
for v in a:
if isinstance(v, unicode):
autf.append(v.encode('utf-8'))
elif isinstance(v, dict):
autf.append(dict_to_utf(v))
elif isinstance(v, list):
autf.append(array_to_utf(v))
else:
autf.append(v)
return autf
def dict_to_utf(d):
dutf = {}
for k,v in d.iteritems():
if isinstance(v, unicode):
dutf[k] = v.encode('utf-8')
elif isinstance(v, list):
dutf[k] = array_to_utf(v)
elif isinstance(v, dict):
dutf[k] = dict_to_utf(v)
else:
dutf[k] = v
return dutf
test = {1: u'1', 2: '2', 3: {'x': u'x', 'y': 'y'}, 4: [u'ara', 's', 123], 5: 123}
print(dict_to_utf(a))
# {1: '1', 2: '2', 3: {'y': 'y', 'x': 'x'}, 4: ['ara', 's', 123], 5: 123}
Both methods are recursive on their own and between each other.
Upvotes: 1
Reputation: 497
With python 3.4 using :
io.open(filename, 'w', encoding='utf8')
instead of
open(filename, 'w')
solved the same problem for me.
Upvotes: 1
Reputation: 180481
Not sure what format you expect the output in but this will encode your strings:
def map_to(d):
# iterate over the key/values pairings
for k, v in d.items():
# if v is a list join and encode else just encode as it is a string
d[k] = ",".join(v).encode("utf-8") if isinstance(v, list) else v.encode("utf-8")
map_to(data)
with open('best_food_n_drink.csv', 'w') as csvfile:
categories = ['category', 'category_url', 'winner', 'runners_up']
writer = csv.DictWriter(csvfile, fieldnames=categories)
writer.writeheader()
writer.writerow(data)
This will output something like the following but with your mixture of strings and lists I don't really know what it should end up like:
category,category_url,winner,runners_up
Best food blog written by a linguist ,http://www.chicagoreader.com/chicago/best-food-blog-written-by-a-linguist/BestOf?oid=4101663,shesimmers.com,
Now we have discovered you actually have a list if dicts we need to iterate over the list but the logic is still the same, we just run the function on each dict in the loop:
data = [{'category': u"Best restaurant that's been around forever and is still worth the trip\xa0", 'runners_up': [u'Frontera Grill', u'Chicago Diner ', u'Sabatino\u2019s', u'Twin Anchors'], 'winner': [u'Lula Cafe'], 'category_url': 'http://www.chicagoreader.com/chicago/BestOf?category=1979894&year=2011'},
{'category': u'Best bang for your buck\xa0', 'runners_up': [u'Frasca Pizzeria & Wine Bar', u'Chutney Joe\u2019s', u'"My boyfriend!"'], 'winner': [u'Big Star', u'Sultan\u2019s Market']}]
def map_to(d):
for k, v in d.items():
d[k] = ",".join(v).encode("utf-8") if isinstance(v, list) else v.encode("utf-8")
with open('best_food_n_drink.csv', 'w') as csvfile:
categories = ['category', 'category_url', 'winner', 'runners_up']
writer = csv.DictWriter(csvfile, fieldnames=categories)
writer.writeheader()
# get each dict from the list
for d in data:
# run the encode func
map_to(d)
writer.writerow(d)
I presume 'category_url'
actually exists in the second dict.
To catch the None's and avoid encoding errors add a line to the func:
def map_to(d):
for k, v in d.items():
# catch None's
if v is not None:
d[k] = " ".join(v).encode("utf-8") if isinstance(v, list) else v.encode("utf-8")
Depending on what you plan on doing with the data storing the data as json
might be useful:
import json
with open('best_food_n_drink.js', 'w') as js:
json.dump(data,js)
Then to get the list if data:
import json
with open('best_food_n_drink.json') as js:
data = json.load(js)
Upvotes: 1