Reputation: 527
I column x in dataframe has only 0 and 1. I want to create variable y which starts counting zeros and resets when when 1 comes in x. I'm getting an error "The truth value of a Series is ambiguous."
count=1
countList=[0]
for x in df['x']:
if df['x'] == 0:
count = count + 1
df['y']= count
else:
df['y'] = 1
count = 1
Upvotes: 1
Views: 195
Reputation: 863031
First dont loop in pandas, because slow, if exist some vectorized solution.
I think need count consecutive 0
values:
df = pd.DataFrame({'x':[1,0,0,1,1,0,1,0,0,0,1,1,0,0,0,0,1]})
a = df['x'].eq(0)
b = a.cumsum()
df['y'] = (b-b.mask(a).ffill().fillna(0).astype(int))
print (df)
x y
0 1 0
1 0 1
2 0 2
3 1 0
4 1 0
5 0 1
6 1 0
7 0 1
8 0 2
9 0 3
10 1 0
11 1 0
12 0 1
13 0 2
14 0 3
15 0 4
16 1 0
Detail + explanation:
#compare by zero
a = df['x'].eq(0)
#cumulative sum of mask
b = a.cumsum()
#replace Trues to NaNs
c = b.mask(a)
#forward fill NaNs
d = b.mask(a).ffill()
#First NaNs to 0 and cast to integers
e = b.mask(a).ffill().fillna(0).astype(int)
#subtract from cumulative sum Series
y = b - e
df = pd.concat([df['x'], a, b, c, d, e, y], axis=1, keys=('x','a','b','c','d','e', 'y'))
print (df)
x a b c d e y
0 0 True 1 NaN NaN 0 1
1 0 True 2 NaN NaN 0 2
2 0 True 3 NaN NaN 0 3
3 1 False 3 3.0 3.0 3 0
4 1 False 3 3.0 3.0 3 0
5 0 True 4 NaN 3.0 3 1
6 1 False 4 4.0 4.0 4 0
7 0 True 5 NaN 4.0 4 1
8 0 True 6 NaN 4.0 4 2
9 0 True 7 NaN 4.0 4 3
10 1 False 7 7.0 7.0 7 0
11 1 False 7 7.0 7.0 7 0
12 0 True 8 NaN 7.0 7 1
13 0 True 9 NaN 7.0 7 2
14 0 True 10 NaN 7.0 7 3
15 0 True 11 NaN 7.0 7 4
16 1 False 11 11.0 11.0 11 0
Upvotes: 3