Reputation: 679
I'm having trouble to to filter a dataframe using the result of groupby condition. I already tried to use an mask with isin()
but it doesn't return only the paired conditions.
Suppose i have a dataset like this below:
Dest Origin DepDelay
0 TPA IAD 8.0
1 TPA IAD 19.0
2 BWI IND 8.0
4 BWI IND 34.0
5 JAX IND 25.0
6 LAS IND 67.0
8 MCI IND 2.0
10 MCO IND 6.0
11 MCO IND 94.0
... .... ... ...
7009710 ATL MSP 30.0
I wanna filter this using the result of groupby condition, where i got all top 5 routes. To obtain the routes i used this code:
top_5_route = flights_df[flights_df['DepDelay'] > 0].groupby(['Origin', 'Dest'])['Dest'].size().nlargest(5).index.to_list()
top_5_route:
[('LAX', 'SFO'),
('DAL', 'HOU'),
('SFO', 'LAX'),
('ORD', 'LGA'),
('HOU', 'DAL')]
i wanna filter this dataframe based on this labels to get only the paired conditions for "Origin - Dest". For example, the new dataframe should contain only values where origin is LAX
and the Dest is SFO
and the others paired conditions.
If i use isin
method the dataframe also will contain values LAX - SFO
, LAX-HOU
. That does not match the paired condition.
Thanks!
Upvotes: 2
Views: 393
Reputation: 9806
You can create a new column called 'Route':
flights_df['Route'] = flights_df['Origin'] + '-' + flights_df['Dest']
And then group by route to get the top 5 delayed routes
top_5_route = flights_df[flights_df['DepDelay'] > 0.0].groupby('Route').size().nlargest(5)
To filter the flights_df
on these routes:
flights_df[flights_df['Route'].isin(top_5_route.keys())]
Upvotes: 1
Reputation: 75080
IIUC, you need to filter rows which matches your list of tuple,use:
df.set_index(['Dest','Origin']).loc[top_5_route].reset_index()
Upvotes: 2