Reputation: 173
I have this dataframe with 11 million rows:
I want to count how many users tweeted the same number of tweets using the 'user_id'
column and the plot the histogram (y-axis: number of users, x-axis: number of tweets).
I tried this:
user_tweet_df.groupby('tweet_count').count()
This couldn't work. Can anyone please help? Thank you.
Upvotes: 0
Views: 706
Reputation: 1288
See if below will work for you. Use pandas docs on visualization to customize your graph as needed.
import matplotlib.pyplot as plt
import pandas as pd
from tabulate import tabulate
tweets_df = pd.DataFrame({'user_id':[312,412,521,577,614,753,965,989],
'user_name':['Mary','Bob','Hans','Nicole','Chris','Matt','Carol','Khan'],
'tweet_count':[207,35,35,1,2,1,1,15]})
print(tabulate(tweets_df, headers='keys'), '\n')
grouped_df = tweets_df.groupby('tweet_count').count()[['user_id']]
print(tabulate(grouped_df, headers='keys'), '\n')
grouped_df.plot(kind='bar')
plt.show()
Upvotes: 1