Reputation: 45
I have a dataframe of values that looked like this:
mode journey stage
0 BUS 1 1
1 RTS 1 2
2 BUS 2 1
3 RTS 3 1
4 BUS 3 2
5 BUS 4 1
I did a groupby ['journey','mode'] and obtained this:
g=df.groupby(['journey','mode'])
g.size()
journey mode
1 BUS 1
RTS 1
2 BUS 1
3 RTS 1
BUS 1
4 BUS 1
I would want to identify journeys that involved both BUS and RTS, so in this case, the result should return journey 1 and 3. May I know how this can be done? Thanks.
Upvotes: 3
Views: 15578
Reputation: 251518
You can group by journey alone and then check whether each group contains more than one stage:
>>> df.groupby('journey')['mode'].apply(lambda g: len(g)>1)
journey
1 True
2 False
3 True
4 False
dtype: bool
If your data contains other modes and you specifically need to check for ones that involve "BUS" and "RTS", you can do a more explicit check:
>>> d.groupby('journey')['mode'].apply(lambda g: 'BUS' in g.values and 'RTS' in g.values)
journey
1 True
2 False
3 True
4 False
dtype: bool
The basic idea, though, is that if you want to find "journeys that involved such-and-such", you should group only by journey, and then compute for each group whether it matches the such-and-such condition.
Upvotes: 1