n_lev
n_lev

Reputation: 45

Using Certain Conditions to Find Certain Parts of a Dataframe

I have a dataframe that looks like this:

>>> data = {'Count':[15, 21, 1, 7, 6, 1, 25, 8, 56, 0, 5, 9, 0, 12, 12, 8, 7, 12, 0, 8]}
>>> df = pd.DataFrame(data)
>>> df
    Count
0      15
1      21
2       1
3       7
4       6
5       1
6      25
7       8
8      56
9       0
10      5
11      9
12      0
13     12
14     12
15      8
16      7
17     12
18      0
19      8

I need to add two columns to this df to detect "floods". "Flood" is defined as from the row where 'Count' goes above 10 and until 'Count' drops below 5. So, in this case, I want this as a result:

    Count   Flood   FloodNumber
0      15    True             1
1      21    True             1
2       1   False             0
3       7   False             0
4       6   False             0
5       1   False             0
6      25    True             2
7       8    True             2
8      56    True             2
9       0   False             0
10      5   False             0
11      9   False             0
12      0   False             0
13     12    True             3
14     12    True             3
15      8    True             3
16      7    True             3
17     12    True             3
18      0   False             0
19      8   False             0

I managed to add my 'Flood' column with a simple loop like this:

df.loc[0, 'Flood'] = (df.loc[0, 'Count'] > 10)
for index in range(1, len(df)):
    df.loc[index, 'Flood'] = ((df.loc[index, 'Count'] > 10) | ((df.loc[index-1, 'Flood']) & (df.loc[index, 'Count'] > 5)))

, but this seems like an extremly slow and stupid way of doing this. Is there any "proper" way of doing it using pandas functions rather than loops?

Upvotes: 2

Views: 31

Answers (1)

rafaelc
rafaelc

Reputation: 59274

To find Flood flags, we can play with masks and ffill().

df['Flood'] = ((df.Count > 10).where(df.Count > 10)
               .fillna((df.Count > 5)
                       .where(df.Count < 5))
               .ffill()
               .astype(bool))

To get the FloodNumber, let's ignore all rows which are False in the Flood column and groupby+cumsum

s = df.Flood.where(df.Flood)
df.loc[:, 'FloodNumber'] = s.dropna().groupby((s != s.shift(1)).cumsum()).ngroup().add(1)

Outputs

    Count  Flood  FloodNumber
0      15   True          1.0
1      21   True          1.0
2       1  False          NaN
3       7  False          NaN
4       6  False          NaN
5       1  False          NaN
6      25   True          2.0
7       8   True          2.0
8      56   True          2.0
9       0  False          NaN
10      5  False          NaN
11      9  False          NaN
12      0  False          NaN
13     12   True          3.0
14     12   True          3.0
15      8   True          3.0
16      7   True          3.0
17     12   True          3.0
18      0  False          NaN
19      8  False          NaN

Upvotes: 1

Related Questions