Reputation: 596
This is probably a very silly question. But, I'll still go ahead and ask. How would you increment a counter only the first time a particular value is reached?
For example, if I have step below as a column of the df and would want to add a counter column called 'counter' which increments the first time the 'step' column has a value of 6
Upvotes: 2
Views: 458
Reputation: 862641
Use:
df = pd.DataFrame({'step':[2, 2, 2, 3, 4, 4, 5, 6, 6, 6, 6, 7, 5, 6, 6, 6, 7, 5, 6, 7, 5]})
a = df['step'] == 6
b = (~a).shift()
b[0] = a[0]
df['counter1'] = (a & b).cumsum()
print (df)
step counter
0 2 0
1 2 0
2 2 0
3 3 0
4 4 0
5 4 0
6 5 0
7 6 1
8 6 1
9 6 1
10 6 1
11 7 1
12 5 1
13 6 2
14 6 2
15 6 2
16 7 2
17 5 2
18 6 3
19 7 3
20 5 3
Explanation:
Get boolean mask for comparing with 6
:
a = df['step'] == 6
Invert Series
and shift
:
b = (~a).shift()
If first value is 6
then get no first group, so need set first value by first a
value:
b[0] = a[0]
Chain conditions by bitwise and
- &
:
c = a & b
Get cumulative sum:
d = c.cumsum()
print (pd.concat([df['step'], a, b, c, d], axis=1, keys=('abcde')))
a b c d e
0 2 False False False 0
1 2 False True False 0
2 2 False True False 0
3 3 False True False 0
4 4 False True False 0
5 4 False True False 0
6 5 False True False 0
7 6 True True True 1
8 6 True False False 1
9 6 True False False 1
10 6 True False False 1
11 7 False False False 1
12 5 False True False 1
13 6 True True True 2
14 6 True False False 2
15 6 True False False 2
16 7 False False False 2
17 5 False True False 2
18 6 True True True 3
19 7 False False False 3
20 5 False True False 3
If performance is important, use numpy
solution:
a = (df['step'] == 6).values
b = np.insert((~a)[:-1], 0, a[0])
df['counter1'] = np.cumsum(a & b)
Upvotes: 2
Reputation: 9081
You can use .shift()
in pandas
-
Notice how you only want to increment if value of
df['step']
is6
and value ofdf.shift(1)['step']
is not 6.
df['counter'] = ((df['step']==6) & (df.shift(1)['step']!=6 )).cumsum()
print(df)
Output
step counter
0 2 0
1 2 0
2 2 0
3 3 0
4 4 0
5 4 0
6 5 0
7 6 1
8 6 1
9 6 1
10 6 1
11 7 1
12 5 1
13 6 2
14 6 2
15 6 2
16 7 2
17 5 2
18 6 3
19 7 3
20 5 3
Explanation
a. df['step']==6
gives boolean
values - True
if the step
is 6
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 True
8 True
9 True
10 True
11 False
12 False
13 True
14 True
15 True
16 False
17 False
18 True
19 False
20 False
Name: step, dtype: bool
b. df.shift(1)['step']!=6
shifts the data by 1 row and then checks if value is equal to 6.
When both these conditions satisfy, you want to increment - .cumsum()
will take care of that. Hope that helps!
P.S - Although it's a good question, going forward please avoid pasting images. You can directly paste data and format as code. Helps the people who are answering to copy-paste
Upvotes: 2
Reputation: 7364
If your DataFrame is called df, it's
import pandas as pd
q_list = [2, 2, 2, 3, 4, 4, 5, 6, 6, 6, 6, 7, 5, 6, 6, 6, 7, 5, 6, 7, 5]
df = pd.DataFrame(q_list, columns=['step'])
counter = 0
flag = False
for index, row in df.iterrows():
if row ['step'] == 6 and flag == False:
counter += 1
flag = True
elif row ['step'] != 6 and flag == True:
flag = False
df.set_value(index,'counter',counter)
Upvotes: 0
Reputation: 8754
If your DataFrame is called df
, one possible way without iteration is
df['counter'] = 0
df.loc[1:, 'counter'] = ((df['steps'].values[1:] == 6) & (df['steps'].values[:-1] != 6)).cumsum()
This creates two boolean arrays, the conjunction of which is True when the previous row did not contain a 6 and the current row does contain a 6. You can sum this array to obtain the counter.
Upvotes: 1
Reputation: 20147
That's not a silly question. To get the desired output in your counter
column, you can try (for example) this:
steps = [2, 2, 2, 3, 4, 4, 5, 6, 6, 6, 6, 7, 5, 6, 6, 6, 7, 5, 6, 7, 5]
counter = [idx for idx in range(len(steps)) if steps[idx] == 6 and (idx==0 or steps[idx-1] != 6)]
print(counter)
results in:
>> [7, 13, 18]
, which are the indices in steps
where a first 6
occurred. You can now get the total times that has happened with len(counter)
, or reproduce the second column the exact way you have given it with
counter_column = [0]
for idx in range(len(steps)):
counter_column.append(counter_column[-1])
if idx in counter:
counter_column[-1] += 1
Upvotes: 0