Reputation: 5
I'm using tweepy to capture some tweets in Portuguese and I'm saving these tweets in a csv file. All tweet text we're saved with special characters and now I can't convert then to the correct format.
My coding for the tweet capture is:
csvFile = open('ua.csv', 'a')
csvWriter = csv.writer(csvFile)
for tweet in tweepy.Cursor(api.user_timeline,id=usuario,count=10,
lang="en",
since="2018-12-01").items():
csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])
I'm reading the results like this:
test = pd.read_csv('ua.csv', header=None)
test.columns = ["date", "text"]
result = test['text'][0]
print(result)
'Aproveita essa promo\xc3\xa7\xc3\xa3o aqui!'
The result I need sholud be this:
print(result)
'Aproveita essa promoção aqui!'
I tried this code to convert:
print(result.decode('utf-8'))
and got this error message:
AttributeError: 'str' object has no attribute 'decode'
Where am I doing wrong ?
Upvotes: 0
Views: 762
Reputation: 178429
Open the file with the encoding to be used. Don't encode it manually (Zen of Python: Explicit is better than implicit):
# newline='' per csv documentation
# encoding='utf-8-sig' if you plan on using Excel to read the csv, else 'utf8' is fine.
with open('ua.csv','a',encoding='utf-8-sig',newline='') as csvFile:
csvWriter = csv.writer(csvFile)
for tweet in tweepy.Cursor(api.user_timeline,id=usuario,count=10,
lang="en",
since="2018-12-01").items():
csvWriter.writerow([tweet.created_at, tweet.text)
Here's a working example:
import csv
import pandas as pd
with open('ua.csv','w',encoding='utf-8-sig',newline='') as csvFile:
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['timestamp','Aproveita essa promoção aqui!'])
test = pd.read_csv('ua.csv', encoding='utf-8-sig', header=None)
print(test)
Output:
0 1
0 timestamp Aproveita essa promoção aqui!
Upvotes: 0
Reputation: 96360
The problem is that you are creating a bytes
object when you .encode
your tweet, you don't need to do this.
A csv.writer
object will coerce to string whatever you pass to it.
Note:
In [1]: import csv
In [2]: s = 'Aproveita essa promoção aqui!'
In [3]: print(s)
Aproveita essa promoção aqui!
In [4]: print(s.encode())
b'Aproveita essa promo\xc3\xa7\xc3\xa3o aqui!'
In [5]: with open('test.txt', 'a') as f:
...: writer = csv.writer(f)
...: writer.writerow([1, 3.4, 'Aproveita essa promoção aqui!'.encode()])
...:
In [6]: !cat test.txt
1,3.4,b'Aproveita essa promo\xc3\xa7\xc3\xa3o aqui!'
So just use:
csvWriter.writerow([tweet.created_at, tweet.text])
Upvotes: 1
Reputation: 49920
The pandas read_csv
has an encoding
parameter:
Encoding to use for UTF when reading/writing (ex. ‘utf-8’).
Upvotes: 0