Reputation: 2669
I have a pandas dataframe which looks like:
df = pd.DataFrame(data={'id':[1234, 1234, 1234, 1234, 1234], 'year':['2017', '2017', '2018', '2018', '2018'], 'count_to_today':[1, 2, 3, 3, 4})
And I need to count how many times count_to_today
happens cumulatively in each year per id
. i.e.
counts = pd.DataFrame(data={'id':[1234, 1234, 1234, 1234, 1234], 'year':['2017', '2017', '2018', '2018', '2018'], 'count_to_today':[1, 2, 1, 1, 2]})
I.e. I have a running count since the beginning of time, and I want to count the number of times it increments cumulatively per year.
I'm a bit confused about how to do this. I know I need to groupby id
and year
but I can't figure out how to get .count()
or .value_counts()
to give me the counts per year.
Upvotes: 1
Views: 89
Reputation: 403030
Similar to your previous question, but use cumsum
instead:
df.count_to_today.diff().ne(0).groupby([df.id, df.year]).cumsum()
0 1.0
1 2.0
2 1.0
3 1.0
4 2.0
Name: count_to_today, dtype: float64
df['count_to_today'] = (
df.count_to_today.diff().ne(0).groupby([df.id, df.year]).cumsum().astype(int))
df
id year count_to_today
0 1234 2017 1
1 1234 2017 2
2 1234 2018 1
3 1234 2018 1
4 1234 2018 2
Upvotes: 1