lara_toff
lara_toff

Reputation: 442

Pandas groupby with isin for consecutive groups

I have a dataframe that looks like the following:

arr = pd.DataFrame([[0,0],[0,1],[0,4],[1,4],[1,5],[1,6],[2,5],[2,8],[2,6])

My desired output is booleans that represent whether the value in column 2 is in the next consecutive group or not. The groups are represented by the values in column 1. So for example, 4 shows up in group 0 and the next consecutive group, group 1:

output = pd.DataFrame([[False],[False],[True],[False],[True],[True],[Nan],[Nan],[Nan]])

The outputs for group 2 would be Nan because group 3 doesn't exist.

So far I have tried this:

output = arr.groupby([0])[1].isin(arr.groupby([0])[1].shift(periods=-1))

This doesn't work because I can't apply the isin() on a groupby series.

Upvotes: 1

Views: 154

Answers (1)

RJ Adriaansen
RJ Adriaansen

Reputation: 9639

You could create a helper column with lists of shifted group items, then check against that with a function that returns True, False of NaN:

import pandas as pd
import numpy as np

arr = pd.DataFrame([[0,0],[0,1],[0,4],[1,4],[1,5],[1,6],[2,5],[2,8],[2,6]])
arr = pd.merge(arr, arr.groupby([0]).agg(list).shift(-1).reset_index(), on=[0], how='outer')

def check_columns(row):
    try:
        if row['1_x'] in row['1_y']:
            return True
        else:
            return False
    except:
        return np.nan
    
arr.apply(check_columns, axis=1)

Result:

0    False
1    False
2     True
3    False
4     True
5     True
6      NaN
7      NaN
8      NaN

Upvotes: 1

Related Questions