Reputation: 368
I am trying to return a running count to a separate column
in a pandas
df
. There are two columns
in the df
below. When the integer increases I want to return an increased count to a separate column
that is relative to the Val
Column
.
import pandas as pd
d = ({
'Val' : ['Foo','Bar','Foo','Foo','Foo','Bar','Foo','Bar'],
'Int' : [0,1,2,2,3,4,4,5],
})
df = pd.DataFrame(d)
Out:
Val Int
0 Foo 0
1 Bar 1
2 Cat 2
3 Foo 2
4 Foo 3
5 Bar 4
6 Foo 4
7 Bar 5
When I try return a running count it just returns NaN's
for val in ['Foo','Bar']:
cond = df.Val.eq(val) & df.Int.eq(int)
df.loc[cond, 'Count_' + val] = cond[cond].cumsum()
Intended Output:
Val Int Foo_Count Bar_Count
0 Foo 0 0 0
1 Bar 1 0 1
2 Foo 2 1 1
3 Foo 2 1 1
4 Foo 3 2 1
5 Bar 4 2 2
6 Foo 4 2 2
7 Bar 5 2 3
Upvotes: 1
Views: 396
Reputation: 323236
By using get_dummies
with cumsum
s=df.Val.str.get_dummies().cumsum().add_suffix('_count')
s
Out[742]:
Bar_count Foo_count
0 0 1
1 1 1
2 1 2
3 1 3
4 1 4
5 2 4
6 2 5
7 3 5
df=pd.concat([df,s],axis=1)
Upvotes: 5