Reputation: 367
I want to create a column that increments up by 1 for every row that is not NaT in diffs. If the value is NaT, I want the increment to reset
Below is an example dataframe:
x y min z o diffs
0 0 0 0 1 1 NaT
1 0 0 0 2 1 00:00:01
2 0 0 0 6 1 00:00:04
3 0 0 0 11 1 00:00:05
4 0 0 0 14 0 NaT
5 0 0 2 18 0 NaT
6 0 0 2 41 1 NaT
7 0 0 2 42 0 NaT
8 0 0 8 13 1 00:00:54
9 0 0 8 16 1 00:00:03
10 0 0 8 17 1 00:00:01
11 0 0 8 20 0 NaT
12 0 0 8 32 1 NaT
This is my expected output:
x y min z o diffs increment
0 0 0 0 1 1 NaT 0
1 0 0 0 2 1 00:00:01 1
2 0 0 0 6 1 00:00:04 2
3 0 0 0 11 1 00:00:05 3
4 0 0 0 14 0 NaT 0
5 0 0 2 18 0 NaT 0
6 0 0 2 41 1 NaT 0
7 0 0 2 42 0 NaT 0
8 0 0 8 13 1 00:00:54 1
9 0 0 8 16 1 00:00:03 2
10 0 0 8 17 1 00:00:01 3
11 0 0 8 20 0 NaT 0
12 0 0 8 32 1 NaT 0
Upvotes: 1
Views: 1616
Reputation: 863611
Use numpy.where
with set not missing values to counter by cumcount
with consecutive non missing groups:
m = df['diffs'].notnull()
df['increment'] = np.where(m, df.groupby(m.ne(m.shift()).cumsum()).cumcount()+1, 0)
print (df)
x y min z o diffs increment
0 0 0 0 1 1 NaT 0
1 0 0 0 2 1 00:00:01 1
2 0 0 0 6 1 00:00:04 2
3 0 0 0 11 1 00:00:05 3
4 0 0 0 14 0 NaT 0
5 0 0 2 18 0 NaT 0
6 0 0 2 41 1 NaT 0
7 0 0 2 42 0 NaT 0
8 0 0 8 13 1 00:00:54 1
9 0 0 8 16 1 00:00:03 2
10 0 0 8 17 1 00:00:01 3
11 0 0 8 20 0 NaT 0
12 0 0 8 32 1 NaT 0
If performance is important, alternative solution:
b = m.cumsum()
df['increment'] = b-b.mask(m).ffill().fillna(0).astype(int)
Upvotes: 3