Michael
Michael

Reputation: 329

Check if specific values in column follow each other for each id

I have the following dataframe

id | status
____________
1  | reserved
2  | signed
1  | waiting
1  | signed
3  | waiting
2  | sold
3  | reserved
1  | sold

I want to chech a hypothesis that statuses reserved, waiting, signed always lead to status sold. I only need to check the following order, some statuses may be omitted like for id == 2 in dataframe.

I wonder if there's a way to look for next row values in grouped by id dataframe

Expected output is dataframe or list of ids that follow the above rule. For the dataframe above it would be this:

id
__
1
2

My attemp was to get all unique id with those statuses and then for each id found list of it's statuses. Then I thought to filter it somehow but there are a lot of combinations.

df = df[df.status.isin(['reserved', 'waiting', 'signed', 'sold'])]

df1 = df.groupby('flat_id').['status'].unique()

df1.where('status'== [''reserved', 'waiting', 'signed', 'sold'']
or ['reserved', 'waiting', 'sold'] ... )

Upvotes: 0

Views: 43

Answers (1)

mozway
mozway

Reputation: 261994

IIUC, you just want to check that 'sold' is the last value per group:

m = df.groupby('id')['status'].apply(lambda s: s.iloc[-1] == 'sold')
out = m[m].index.tolist()

output: [1, 2]

If you want to ensure there is something before 'sold':

m = df.groupby('id')['status'].apply(lambda s: len(s)>1 and s.iloc[-1] == 'sold')

And if you want to ensure that this something is in a specific list:

m = df.groupby('id')['status'].apply(lambda s: s.isin(['reserved', 'waiting', 'signed']).any()
                                           and s.iloc[-1] == 'sold')
m[m].index.tolist()
alternative:
(df.drop_duplicates('id', keep='last')
   .loc[lambda d: d['status'].eq('sold'), 'id']
)

output:

5    2
7    1
Name: id, dtype: int64

Upvotes: 1

Related Questions