eemilk
eemilk

Reputation: 1628

Pandas dataframe: Get value pairs from subsets of dataframe

I have a df:

df = pd.DataFrame({'id': [1, 1, 2, 2, 2, 3, 4, 4, 4], \
                    "name": ["call", "response", "call", "call", "response", "call", "call", "response", "response"]})
    id  name
0   1   call
1   1   response
2   2   call
3   2   call
4   2   response
5   3   call
6   4   call
7   4   response
8   4   response

And I'm trying to extract a call - response pair, where the first response after call is the right pattern. Call and responses pairs are in their own subsets with id like so:

    id  name
0   1   call
1   1   response
3   2   call
4   2   response
6   4   call
7   4   response

Ideally I'd keep the indexes in the dataframe so I can use df.loc with indexes later.

What I have tried is to go through the df in subsets and apply something or use rolling window. But have only succeeded to get errors.

unique_ids = df.id.unique()

for unique_id in unique_ids :
    df.query('id== @unique_id').apply(something))

I have yet to discover something that could work specifically with subsets of dataframe

Upvotes: 2

Views: 549

Answers (1)

jezrael
jezrael

Reputation: 862511

Use DataFrameGroupBy.shift with compare values by Series.eq for check equality and filter in boolean indexing :

m1 = df['name'].eq('call') & df.groupby('id')['name'].shift(-1).eq('response')
m2 = df['name'].eq('response') & df.groupby('id')['name'].shift().eq('call')
df2 = df[m1 | m2]

print (df2)
   id      name
0   1      call
1   1  response
3   2      call
4   2  response
6   4      call
7   4  response

Upvotes: 5

Related Questions