babygroot
babygroot

Reputation: 19

Find all hashtags

I have a class Tweet that contains several tweets. Then there's a list that contains all the tweets. These tweets also have users, amount of retweets and age, which is not relevant to my question. Only content matters.

I need to get a list of all the hashtags, but I only get the one from the 1st tweet with my code.

for x in tweets:
    return re.findall(r'#\w+', x.content)

Upvotes: 0

Views: 95

Answers (1)

jprebys
jprebys

Reputation: 2516

You are returning after the first iteration of the loop. You need to go through all tweets and add the hastags to a list:

def get_hashtags(tweets):
    result = []
    for x in tweets:
        result.extend(re.findall(r'#\w+', x.content))
    return result

For sorting, you can use a defaultdict to add up the reweets. Then, sort by the count.

from collections import defaultdict

def get_hashtags_sorted(tweets):
    result = defaultdict(int)
    for x in tweets:
        for hashtag in re.findall(r'#\w+', x.content):
            result[hashtag] += x.retweets
    sorted_hashtags = sorted(tweets.items(), key=lambda x: x[1])
    return list(sorted_hashtags)

Upvotes: 1

Related Questions