Reputation: 9
I am trying to add a column to my dataframe, which will hold a value which represents the number of times a unique value has appeared in another column.
For example , I haver the following dataframe:
Date|Team|Goals|
22.08.20|Team1|4|
22.08.20|Team2|3|
22.08.20|Team3|1|
22.09.20|Team1|4|
22.09.20|Team3|5|
I would like to add a counter column, which counts how often each team appears:
Date|Team|Goals|Count|
22.08.20|Team1|4|1|
22.08.20|Team2|3|1|
22.08.20|Team3|1|1|
22.09.20|Team1|4|2|
22.09.20|Team3|5|2|
My Dataframe is ordered by date, so the teams should appear in the correct order.
Apologies, very new to pandas and stack overflow, so please let me know if I can format this question differently. Thanks
Upvotes: 0
Views: 48
Reputation: 631
Another answer building upon @Nk03's with replicable results:
import pandas as pd
import numpy as np
# Set numpy random seed
np.random.seed(42)
# Create dates array
dates = pd.date_range(start='2021-06-01', periods=10, freq='D')
# Create teams array
teams_names = ['Team 1', 'Team 2', 'Team 3']
teams = [teams_names[i] for i in np.random.randint(0, 3, 10)]
# Create goals array
goals = np.random.randint(1, 6, 10)
# Create DataFrame
data = pd.DataFrame({'Date': dates,
'Team': teams,
'Goals': goals})
# Cumulative count of teams
data['Count'] = data.groupby('Team').cumcount().add(1)
The output will be:
Date Team Goals Count
0 2021-06-01 Team 2 3 1
1 2021-06-02 Team 2 1 2
2 2021-06-03 Team 2 4 3
3 2021-06-04 Team 1 2 1
4 2021-06-05 Team 2 4 4
5 2021-06-06 Team 1 2 2
6 2021-06-07 Team 2 2 5
7 2021-06-08 Team 3 4 1
8 2021-06-09 Team 3 5 2
9 2021-06-10 Team 1 2 3
Upvotes: 0
Reputation: 14949
TRY:
df['Count'] = df.groupby('Team').cumcount().add(1)
OUTPUT:
Date Team Goals Count
0 22.08.20 Team1 4 1
1 22.08.20 Team2 3 1
2 22.08.20 Team3 1 1
3 22.09.20 Team1 4 2
4 22.09.20 Team3 5 2
Upvotes: 2