Thiago Petrone
Thiago Petrone

Reputation: 313

Get most frequent elements across all groups in a Panda time series

How one can plot the count of the n most frequent elements across all groups for a given multi group time series? Note this is different from n most frequent elements of each group, which could be accomplished with count and nlargest.

Given a dataframe:

import pandas as pd

data = {'year': [2020, 2020, 2021, 2021, 2022], 
        'month': [1, 1, 2, 2, 3],
        'Name': ['name_1', 'name_2', 'name_1', 'name_2', 'name_1'], 
        'count': [10, 12, 8, 10, 2]}  

df = pd.DataFrame(data)

print(df)

which outputs

   year  month    Name  Count
0  2020      1  name_1     10
1  2020      1  name_2     12
2  2021      2  name_1      8
3  2021      2  name_2     10
4  2022      3  name_1      2


I would like to plot only name_1's count since, although it does not have the largest count in any group (or even overall), it "appears" more times.

Upvotes: 1

Views: 69

Answers (1)

mozway
mozway

Reputation: 260790

IIUC, you want to filter the most common Name and plot the counts?

# get top Name
top = df['Name'].value_counts().index[0]

# filter
df2 = df[df['Name'].eq(top)]

# plot
(df2.assign(date=df2[['year', 'month']].astype(str).apply('_'.join, axis=1))
    .plot.bar(x='date', y='count')
)

enter image description here

several TOP values
# get top Name
top = df['Name'].value_counts().index[:2]

# filter and reshape
df2 = (df[df['Name'].isin(top)]
        .pivot(index=['year', 'month'],
               columns='Name',
               values='count')
      )

# plot
df2.plot.bar()

several top values

Upvotes: 1

Related Questions