Tom Kealy
Tom Kealy

Reputation: 2669

Counting the number of times a counter increments in a groupby

I have a pandas dataframe which looks like:

df = pd.DataFrame(data={'id':[1234, 1234, 1234, 1234, 1234], 'year':['2017', '2017', '2018', '2018', '2018'], 'count_to_today':[1, 2, 3, 3, 4})

And I need to count how many times count_to_today happens cumulatively in each year per id. i.e.

counts = pd.DataFrame(data={'id':[1234, 1234, 1234, 1234, 1234], 'year':['2017', '2017', '2018', '2018', '2018'], 'count_to_today':[1, 2, 1, 1, 2]})

I.e. I have a running count since the beginning of time, and I want to count the number of times it increments cumulatively per year.

I'm a bit confused about how to do this. I know I need to groupby id and year but I can't figure out how to get .count() or .value_counts() to give me the counts per year.

Upvotes: 1

Views: 89

Answers (1)

cs95
cs95

Reputation: 403030

Similar to your previous question, but use cumsum instead:

df.count_to_today.diff().ne(0).groupby([df.id, df.year]).cumsum()

0    1.0
1    2.0
2    1.0
3    1.0
4    2.0
Name: count_to_today, dtype: float64

df['count_to_today'] = (
    df.count_to_today.diff().ne(0).groupby([df.id, df.year]).cumsum().astype(int))
df

     id  year  count_to_today
0  1234  2017               1
1  1234  2017               2
2  1234  2018               1
3  1234  2018               1
4  1234  2018               2

Upvotes: 1

Related Questions