Reputation: 20831
I am trying to create a .csv file with data that I have stored into a list from Twitter search API. I have saved the last 100 tweets with a keyword that I chose (in this case 'reddit') and I am trying to save each tweet into a cell in a .csv file. My code is below and I am returning an error that is:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 0: ordinal not in range(128)
If anyone knows what I can do to fix this it would be greatly appreciated!
import sys
import os
import urllib
import urllib2
import json
from pprint import pprint
import csv
import sentiment_analyzer
import codecs
class Twitter:
def __init__(self):
self.api_url = {}
self.api_url['search'] = 'http://search.twitter.com/search.json?'
def search(self, params):
url = self.make_url(params, apitype='search')
data = json.loads(urllib2.urlopen(url).read().decode('utf-8').encode('ascii', 'ignore'))
txt = []
for obj in data['results']:
txt.append(obj['text'])
return '\n'.join(txt)
def make_url(self, params, apitype='search'):
baseurl = self.api_url[apitype]
return baseurl + urllib.urlencode(params)
if __name__ == '__main__':
try:
query = sys.argv[1]
except IndexError:
query = 'reddit'
t = Twitter()
s = sentiment_analyzer.SentimentAnalyzer()
params = {'q': query, 'result_type': 'recent', 'rpp': 100}
urlName = t.make_url(params)
print urlName
txt = t.search(params)
print s.analyze_text(txt)
myfile = open('reddit.csv', 'wb')
wr = csv.writer(myfile, quoting=csv.QUOTE_MINIMAL)
wr.writerow(txt)
Upvotes: 1
Views: 1527
Reputation: 123473
From the Python 2 documentation for the csv
module:
Note
This version of the csv module doesn’t support Unicode input. Also, there are currently some issues regarding ASCII NUL characters. Accordingly, all input should be UTF-8 or printable ASCII to be safe; see the examples in section Examples.
That said, you can probably parse the .csv
file yourself without too much difficulty using Python's built-in Unicode string support -- there's also this answer.
Upvotes: 6
Reputation: 365767
You realize this kind of problem is exactly the reason behind Python 3.
I'm assuming you have a good reason for insisting on Python 2 instead of Python 3. Maybe you're trying to deploy this on a hosting site that gives you Python 2.7 and that's it, or you're running an ancient OS that Python 3 hasn't been ported to, or whatever.
But if not, just switch. The csv
module in Python 2 doesn't handle Unicode, and has some weird quirks even when you do encode/decode explicitly; the one in Python 3 is all Unicode, and relies on the underlying file
object to deal with the underlying charset.
You will need to change a couple of things, but 2to3 -w twitter.py
will take care of all of it except possibly removing the b
from open('reddit.csv', 'wb')
.
Upvotes: 0