Reputation: 735
I have a list of tweets that is grouped into chunks of tweets within the list like so:
[[tweet1, tweet2, tweet3],[tweet4,tweet5,tweet6],[tweet7, tweet8, tweet9]]
I want to count the number of occurences of each word within each subgroup. To do this, I need to split each tweet into individual words. I want to use something similar to str.split(' '), but I receive an error:
AttributeError: 'list' object has no attribute 'split'
Is there a way to split each tweet into its individual words? The result should looks something like:
[['word1', 'word2', 'word3', 'word2', 'word2'],['word1', 'word1', 'word3', 'word4', 'word5'],['word1', 'word3', 'word3', 'word5', 'word6']]
Upvotes: 3
Views: 115
Reputation: 79
You could create a function that you pass your list to that will assemble and return a dictionary of the words and how many times they show up in your tweets.
def countWords(listitem):
a = []
for x in listitem:
for y in x:
for z in y.split(' '):
a.append(z)
b = {}
for word in a:
if word not in b:
b[word] = 1
else:
b[word] += 1
return b
this way you will keep both your list and be able to assign the return value back to a new variable for inspection.
dictvar = countWords(listoftweets)
creating a definition will allow you to place this inside of its own file that you can always import use in the future.
Upvotes: 0
Reputation: 198324
groups = [["foo bar", "bar baz"], ["foo foo"]]
[sum((tweet.split(' ') for tweet in group), []) for group in groups]
# => [['foo', 'bar', 'bar', 'baz'], ['foo', 'foo']]
EDIT: It seems an explanation is needed.
For each group [... for group in groups]
(tweet.split(' ') for tweet in group)
sum(..., [])
Upvotes: 1
Reputation: 180391
If you want to count the occurrences then use a Counter dict, chaining all the words with itertools.chain after splitting.
from collections import Counter
from itertools import chain
tweets = [['foo bar', 'foo foobar'], ['bar foo', 'bar']]
print([Counter(chain.from_iterable(map(str.split,sub))) for sub in tweets] )
[Counter({'foo': 2, 'foobar': 1, 'bar': 1}), Counter({'bar': 2, 'foo': 1})]
Upvotes: 1
Reputation: 28099
If you have a list of strings
tweets = ['a tweet', 'another tweet']
Then you can split each element using a list comprehension
split_tweets = [tweet.split(' ')
for tweet in tweets]
Since it's a list of lists of tweets:
tweet_groups = [['tweet 1', 'tweet 1b'], ['tweet 2', 'tweet 2b']]
tweet_group_words = [[word
for tweet in group
for word in tweet.split(' ')]
for group in tweet_groups]
Which will give a list of lists of words.
If you want to count distinct words,
words = [set(word
for tweet in group
for word in tweet.split(' '))
for group in tweet_groups]
Upvotes: 6
Reputation: 32189
You want something like this:
l1 = [['a b', 'c d', 'e f'], ['a b', 'c d', 'e f'], ['a b', 'c d', 'e f']]
l2 = []
for i,j in enumerate(l1):
l2.append([])
for k in j:
l2[i].extend(k.split())
print(l2)
Upvotes: 1