How do i split the text part of a tweet to have each word of the text in a new column in a CSV file

Question

Outputting tweets to a CSV file and want to separate the text portion to have each word in a new column so i can run it through a classifier using python

for tweet in alltweets:

    #Loop to only return the tweets that have been posted in the last 24 hours     
    if (datetime.datetime.now() - tweet.created_at).days < 1:
        # transform the tweepy tweets into a 2D array that will populate the csv    
        outtweets.append([tweet.user.name, tweet.created_at, tweet.text.encode("utf-8")])

    else:
        deadend = True
        return
    if not deadend:
        page += 1

# write the csv    
with open('%s_tweets.csv' % screen_name, 'w') as f:
    writer = csv.writer(f)
    writer.writerow(["name", "created_at", "text"])
    writer.writerows(outtweets)
pass

** EDIT **

** EDIT 2 **

outtweets.append(list(itertools.chain([tweet.user.name, tweet.created_at],tweet.text.encode("utf-8").split(' '))))
TypeError: a bytes-like object is required, not 'str'

Rajesh Chamarthi · Accepted Answer

Since tweet.text.encode("utf-8") is one string, you can split it (by space) to convert it into individual words before writing it out.

tweets = [['user1','text of tweet 1'],['user2','text of tweet2']]

import itertools
for tweet in tweets:
    print list(itertools.chain([tweet[0]], tweet[1].split(' ')))

['user1', 'text', 'of', 'tweet', '1']
['user2', 'text', 'of', 'tweet2']

Try this in your code, in place of the current outtweets.append

outtweets.append(list(itertools.chain([tweet.user.name, tweet.created_at],tweet.text.encode("utf-8").split(' ')))

The above code builds two lists, one with all the old attributes and one with the words in the tweet text and then merges them into one list.

How do i split the text part of a tweet to have each word of the text in a new column in a CSV file

Answers (1)

Related Questions