elnino
elnino

Reputation: 173

Pandas groupby count one column against the other column

I have this dataframe with 11 million rows:

Dataframe

I want to count how many users tweeted the same number of tweets using the 'user_id' column and the plot the histogram (y-axis: number of users, x-axis: number of tweets).

I tried this:

user_tweet_df.groupby('tweet_count').count()

This couldn't work. Can anyone please help? Thank you.

Upvotes: 0

Views: 709

Answers (1)

Ilya Berdichevsky
Ilya Berdichevsky

Reputation: 1298

See if below will work for you. Use pandas docs on visualization to customize your graph as needed.

import matplotlib.pyplot as plt
import pandas as pd
from tabulate import tabulate

tweets_df = pd.DataFrame({'user_id':[312,412,521,577,614,753,965,989],
                    'user_name':['Mary','Bob','Hans','Nicole','Chris','Matt','Carol','Khan'],
                    'tweet_count':[207,35,35,1,2,1,1,15]})
print(tabulate(tweets_df, headers='keys'), '\n')

grouped_df = tweets_df.groupby('tweet_count').count()[['user_id']]
print(tabulate(grouped_df, headers='keys'), '\n')

grouped_df.plot(kind='bar')
plt.show()

enter image description here

Upvotes: 1

Related Questions