jowen399
jowen399

Reputation: 9

How can I create a datframe column which counts the occurrence of each value in anopther column?

I am trying to add a column to my dataframe, which will hold a value which represents the number of times a unique value has appeared in another column.

For example , I haver the following dataframe:

Date|Team|Goals|

22.08.20|Team1|4|

22.08.20|Team2|3|

22.08.20|Team3|1|

22.09.20|Team1|4|

22.09.20|Team3|5|

I would like to add a counter column, which counts how often each team appears:

Date|Team|Goals|Count|

22.08.20|Team1|4|1|

22.08.20|Team2|3|1|

22.08.20|Team3|1|1|

22.09.20|Team1|4|2|

22.09.20|Team3|5|2|

My Dataframe is ordered by date, so the teams should appear in the correct order.

Apologies, very new to pandas and stack overflow, so please let me know if I can format this question differently. Thanks

Upvotes: 0

Views: 48

Answers (2)

Paulo Schau Guerra
Paulo Schau Guerra

Reputation: 631

Another answer building upon @Nk03's with replicable results:

import pandas as pd
import numpy as np

# Set numpy random seed
np.random.seed(42)

# Create dates array
dates = pd.date_range(start='2021-06-01', periods=10, freq='D')

# Create teams array
teams_names = ['Team 1', 'Team 2', 'Team 3']
teams = [teams_names[i] for i in np.random.randint(0, 3, 10)]

# Create goals array
goals = np.random.randint(1, 6, 10)

# Create DataFrame
data = pd.DataFrame({'Date': dates,
                     'Team': teams,
                     'Goals': goals})

# Cumulative count of teams
data['Count'] = data.groupby('Team').cumcount().add(1)

The output will be:

        Date    Team  Goals  Count
0 2021-06-01  Team 2      3      1
1 2021-06-02  Team 2      1      2
2 2021-06-03  Team 2      4      3
3 2021-06-04  Team 1      2      1
4 2021-06-05  Team 2      4      4
5 2021-06-06  Team 1      2      2
6 2021-06-07  Team 2      2      5
7 2021-06-08  Team 3      4      1
8 2021-06-09  Team 3      5      2
9 2021-06-10  Team 1      2      3

Upvotes: 0

Nk03
Nk03

Reputation: 14949

TRY:

df['Count'] = df.groupby('Team').cumcount().add(1)

OUTPUT:

      Date   Team  Goals  Count
0  22.08.20  Team1      4      1
1  22.08.20  Team2      3      1
2  22.08.20  Team3      1      1
3  22.09.20  Team1      4      2
4  22.09.20  Team3      5      2

Upvotes: 2

Related Questions