Mehdi Zare
Mehdi Zare

Reputation: 1381

Finding the index of rows based on a sequence of values in a column of pandas DataFrame

I have a DataFrame with a column that has three unique character strings. What I need to do is to generate a list containing indexes of rows that has 'very bad' after good, but not 'very bad' after 'bad'.

import random
df = pd.DataFrame({
    'measure': [random.randint(0,10) for _ in range(0,20)],
})

df['status'] = df.apply(
    lambda x: 'good' if x['measure'] > 4 else 'very bad' if x['measure'] < 2  else 'bad',
    axis=1)
    measure    status
0         8      good
1         8      good
2         0  very bad
3         5      good
4         2       bad
5         3       bad
6         9      good
7         9      good
8        10      good
9         5      good
10        1  very bad
11        7      good
12        7      good
13        6      good
14        5      good
15       10      good
16        3       bad
17        0  very bad
18        3       bad
19        5      good

I expect to get this list:

[2, 10]

Is there a one line solution to this?

I don't want to use numeric values as they are used purely here to generate the DataFrame or loop over all rows which is computationally expensive for my use case.

Upvotes: 0

Views: 810

Answers (3)

Mark Wang
Mark Wang

Reputation: 2757

df.loc[lambda x:x.status.eq('very bad') & x.status.shift().eq('good')].index.tolist()

Upvotes: 0

Andy L.
Andy L.

Reputation: 25259

try eq, shift, and loc

s = df.status.eq('very bad')
s1 = df.status.eq('good').shift()

In [30]: (s & s1).loc[lambda x:x].index.tolist()
Out[30]: [2, 10]

Upvotes: 0

Scott Boston
Scott Boston

Reputation: 153500

If your dataframe index is default range index, then you can use this:

np.where((df['status'] == 'very bad') & (df['status'].shift() == 'good'))[0]

Output:

array([ 2, 10], dtype=int64)

Else, you can use the following:

irow = np.where((df['status'] == 'very bad') & (df['status'].shift() == 'good'))[0]
df.index[irow]

Upvotes: 1

Related Questions