Abhishek Sharma
Abhishek Sharma

Reputation: 2039

UnicodeEncodeError in pulling tweets charmap cannot encode

I am trying to pull tweets from my timeline. But I am able to retrieve only half of them. It is throwing charmap codec cannot encode character u "\u2026". characters map to undefined. I tried different encodings utf-8,ASCII, latin-1 and cp1252. But I am getting the same result. So I think the encoding is not getting changed.How should I change the encoding and which encoding should I choose for pulling tweets. I am using windows 7 and python 2.7.8. This is my code

import tweepy
import csv 
consumer_key = ''
consumer_secret = ''
access_token = '' 
access_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
data = api.get_user('')
# Open/Create a file to append data
csvFile = open('hollywood.csv', 'a')
#Use csv Writer
csvWriter = csv.writer(csvFile, delimiter=' ')
i = 0
for tweet in tweepy.Cursor(api.user_timeline).items():
    #Write a row to the csv file/ I use encode utf-8
    csvWriter.writerow([tweet.created_at, tweet.text.encode('cp1252')])
    print tweet.created_at, tweet.text
    i+=1
    if i%5 == 0:
        print i
csvFile.close(). 

Upvotes: 0

Views: 2388

Answers (2)

Concorde
Concorde

Reputation: 87

You need to tell the OS which encoding to use when writing to the file, in this case utf8 and also encode the text fed to the writer.

Try

tweet.text.encode('utf-8')

csvFile = open('hollywood.csv', encode = "utf-8", mode = 'a')

Upvotes: 0

Andrey
Andrey

Reputation: 60055

Try:

tweet.text.encode('utf8')

UTF-8 is bulletproof in this sense. U+2026 can't be encoded in Latin1 and the rest.

Works perfectly:

>>> u"\u2026".encode('utf8')
'\xe2\x80\xa6'

Upvotes: 2

Related Questions