Reputation: 45
I have a dataframe that looks like this:
>>> data = {'Count':[15, 21, 1, 7, 6, 1, 25, 8, 56, 0, 5, 9, 0, 12, 12, 8, 7, 12, 0, 8]}
>>> df = pd.DataFrame(data)
>>> df
Count
0 15
1 21
2 1
3 7
4 6
5 1
6 25
7 8
8 56
9 0
10 5
11 9
12 0
13 12
14 12
15 8
16 7
17 12
18 0
19 8
I need to add two columns to this df to detect "floods". "Flood" is defined as from the row where 'Count' goes above 10 and until 'Count' drops below 5. So, in this case, I want this as a result:
Count Flood FloodNumber
0 15 True 1
1 21 True 1
2 1 False 0
3 7 False 0
4 6 False 0
5 1 False 0
6 25 True 2
7 8 True 2
8 56 True 2
9 0 False 0
10 5 False 0
11 9 False 0
12 0 False 0
13 12 True 3
14 12 True 3
15 8 True 3
16 7 True 3
17 12 True 3
18 0 False 0
19 8 False 0
I managed to add my 'Flood' column with a simple loop like this:
df.loc[0, 'Flood'] = (df.loc[0, 'Count'] > 10)
for index in range(1, len(df)):
df.loc[index, 'Flood'] = ((df.loc[index, 'Count'] > 10) | ((df.loc[index-1, 'Flood']) & (df.loc[index, 'Count'] > 5)))
, but this seems like an extremly slow and stupid way of doing this. Is there any "proper" way of doing it using pandas functions rather than loops?
Upvotes: 2
Views: 31
Reputation: 59274
To find Flood
flags, we can play with masks and ffill()
.
df['Flood'] = ((df.Count > 10).where(df.Count > 10)
.fillna((df.Count > 5)
.where(df.Count < 5))
.ffill()
.astype(bool))
To get the FloodNumber
, let's ignore all rows which are False
in the Flood
column and groupby+cumsum
s = df.Flood.where(df.Flood)
df.loc[:, 'FloodNumber'] = s.dropna().groupby((s != s.shift(1)).cumsum()).ngroup().add(1)
Outputs
Count Flood FloodNumber
0 15 True 1.0
1 21 True 1.0
2 1 False NaN
3 7 False NaN
4 6 False NaN
5 1 False NaN
6 25 True 2.0
7 8 True 2.0
8 56 True 2.0
9 0 False NaN
10 5 False NaN
11 9 False NaN
12 0 False NaN
13 12 True 3.0
14 12 True 3.0
15 8 True 3.0
16 7 True 3.0
17 12 True 3.0
18 0 False NaN
19 8 False NaN
Upvotes: 1