Pandas groupby count one column against the other column

Question

I have this dataframe with 11 million rows:

I want to count how many users tweeted the same number of tweets using the 'user_id' column and the plot the histogram (y-axis: number of users, x-axis: number of tweets).

I tried this:

user_tweet_df.groupby('tweet_count').count()

This couldn't work. Can anyone please help? Thank you.

Ilya Berdichevsky · Accepted Answer

See if below will work for you. Use pandas docs on visualization to customize your graph as needed.

import matplotlib.pyplot as plt
import pandas as pd
from tabulate import tabulate

tweets_df = pd.DataFrame({'user_id':[312,412,521,577,614,753,965,989],
                    'user_name':['Mary','Bob','Hans','Nicole','Chris','Matt','Carol','Khan'],
                    'tweet_count':[207,35,35,1,2,1,1,15]})
print(tabulate(tweets_df, headers='keys'), '
')

grouped_df = tweets_df.groupby('tweet_count').count()[['user_id']]
print(tabulate(grouped_df, headers='keys'), '
')

grouped_df.plot(kind='bar')
plt.show()

Pandas groupby count one column against the other column

Answers (1)

Related Questions