jonboy
jonboy

Reputation: 368

Running count on a pandas df

I am trying to return a running count to a separate column in a pandas df. There are two columns in the df below. When the integer increases I want to return an increased count to a separate column that is relative to the Val Column.

import pandas as pd

d = ({
    'Val' : ['Foo','Bar','Foo','Foo','Foo','Bar','Foo','Bar'],
    'Int' : [0,1,2,2,3,4,4,5],
    }) 

df = pd.DataFrame(d)

Out:

   Val  Int
0  Foo    0
1  Bar    1
2  Cat    2
3  Foo    2
4  Foo    3
5  Bar    4
6  Foo    4
7  Bar    5

When I try return a running count it just returns NaN's

for val in ['Foo','Bar']:
    cond = df.Val.eq(val) & df.Int.eq(int)
    df.loc[cond, 'Count_' + val] = cond[cond].cumsum()

Intended Output:

   Val  Int  Foo_Count  Bar_Count
0  Foo    0          0          0
1  Bar    1          0          1
2  Foo    2          1          1
3  Foo    2          1          1
4  Foo    3          2          1
5  Bar    4          2          2
6  Foo    4          2          2
7  Bar    5          2          3

Upvotes: 1

Views: 396

Answers (1)

BENY
BENY

Reputation: 323236

By using get_dummies with cumsum

s=df.Val.str.get_dummies().cumsum().add_suffix('_count')
s
Out[742]: 
   Bar_count  Foo_count
0          0          1
1          1          1
2          1          2
3          1          3
4          1          4
5          2          4
6          2          5
7          3          5
df=pd.concat([df,s],axis=1)

Upvotes: 5

Related Questions