Reputation: 582
I have a matrix with 0s and 1s, and want to do a cumsum on each column that resets to 0 whenever a zero is observed. For example, if we have the following:
df = pd.DataFrame([[0,1],[1,1],[0,1],[1,0],[1,1],[0,1]],columns = ['a','b'])
print(df)
a b
0 0 1
1 1 1
2 0 1
3 1 0
4 1 1
5 0 1
The result I desire is:
print(df)
a b
0 0 1
1 1 2
2 0 3
3 1 0
4 2 1
5 0 2
However, when I try df.cumsum() * df
, I am able to correctly identify the 0 elements, but the counter does not reset:
print(df.cumsum() * df)
a b
0 0 1
1 1 2
2 0 3
3 2 0
4 3 4
5 0 5
Upvotes: 18
Views: 16671
Reputation: 1956
You may also try the following naive but reliable approach.
Per every column - create groups to count within. Group starts once sequential value difference by row appears and lasts while value is being constant: (x != x.shift()).cumsum()
.
Example:
a b
0 1 1
1 2 1
2 3 1
3 4 2
4 4 3
5 5 3
Calculate cummulative sums within groups per columns using pd.DataFrame
's apply
and groupby
methods and you get cumsum with the zero reset in one line:
import pandas as pd
df = pd.DataFrame([[0,1],[1,1],[0,1],[1,0],[1,1],[0,1]], columns = ['a','b'])
cs = df.apply(lambda x: x.groupby((x != x.shift()).cumsum()).cumsum())
print(cs)
a b
0 0 1
1 1 2
2 0 3
3 1 0
4 2 1
5 0 2
Upvotes: 2
Reputation: 25518
A slightly hacky way would be to identify the indices of the zeros and set the corresponding values to the negative of those indices before doing the cumsum
:
import pandas as pd
df = pd.DataFrame([[0,1],[1,1],[0,1],[1,0],[1,1],[0,1]],columns = ['a','b'])
z = np.where(df['b']==0)
df['b'][z[0]] = -z[0]
df['b'] = np.cumsum(df['b'])
df
a b
0 0 1
1 1 2
2 0 3
3 1 0
4 1 1
5 0 2
Upvotes: 1
Reputation: 862611
You can use:
a = df != 0
df1 = a.cumsum()-a.cumsum().where(~a).ffill().fillna(0).astype(int)
print (df1)
a b
0 0 1
1 1 2
2 0 3
3 1 0
4 2 1
5 0 2
Upvotes: 31
Reputation: 323226
Try this
df = pd.DataFrame([[0,1],[1,1],[0,1],[1,0],[1,1],[0,1]],columns = ['a','b'])
df['groupId1']=df.a.eq(0).cumsum()
df['groupId2']=df.b.eq(0).cumsum()
New=pd.DataFrame()
New['a']=df.groupby('groupId1').a.transform('cumsum')
New['b']=df.groupby('groupId2').b.transform('cumsum')
New
Out[1184]:
a b
0 0 1
1 1 2
2 0 3
3 1 0
4 2 1
5 0 2
Upvotes: 10