elnino
elnino

Reputation: 173

Pandas groupby count one column against the other column

I have this dataframe with 11 million rows:

Dataframe

I want to count how many users tweeted the same number of tweets using the 'user_id' column and the plot the histogram (y-axis: number of users, x-axis: number of tweets).

I tried this:

user_tweet_df.groupby('tweet_count').count()

This couldn't work. Can anyone please help? Thank you.

Upvotes: 0

Views: 706

Answers (1)

Ilya Berdichevsky
Ilya Berdichevsky

Reputation: 1288

See if below will work for you. Use pandas docs on visualization to customize your graph as needed.

import matplotlib.pyplot as plt
import pandas as pd
from tabulate import tabulate

tweets_df = pd.DataFrame({'user_id':[312,412,521,577,614,753,965,989],
                    'user_name':['Mary','Bob','Hans','Nicole','Chris','Matt','Carol','Khan'],
                    'tweet_count':[207,35,35,1,2,1,1,15]})
print(tabulate(tweets_df, headers='keys'), '\n')

grouped_df = tweets_df.groupby('tweet_count').count()[['user_id']]
print(tabulate(grouped_df, headers='keys'), '\n')

grouped_df.plot(kind='bar')
plt.show()

enter image description here

Upvotes: 1

Related Questions