Selecting subset of pandas groupby dataframe where more than one key has values

Question

I have a dataframe of values that looked like this:

    mode   journey  stage
0    BUS   1        1
1    RTS   1        2
2    BUS   2        1
3    RTS   3        1
4    BUS   3        2
5    BUS   4        1

I did a groupby ['journey','mode'] and obtained this:

g=df.groupby(['journey','mode'])  
g.size()

journey  mode
1        BUS    1
         RTS    1
2        BUS    1
3        RTS    1
         BUS    1
4        BUS    1

I would want to identify journeys that involved both BUS and RTS, so in this case, the result should return journey 1 and 3. May I know how this can be done? Thanks.

BrenBarn · Accepted Answer

You can group by journey alone and then check whether each group contains more than one stage:

>>> df.groupby('journey')['mode'].apply(lambda g: len(g)>1)
journey
1           True
2          False
3           True
4          False
dtype: bool

If your data contains other modes and you specifically need to check for ones that involve "BUS" and "RTS", you can do a more explicit check:

>>> d.groupby('journey')['mode'].apply(lambda g: 'BUS' in g.values and 'RTS' in g.values)
journey
1           True
2          False
3           True
4          False
dtype: bool

The basic idea, though, is that if you want to find "journeys that involved such-and-such", you should group only by journey, and then compute for each group whether it matches the such-and-such condition.

Selecting subset of pandas groupby dataframe where more than one key has values

Answers (1)

Related Questions