Reputation: 29
I have the following data frame :
Loc1 | Loc2 | Month | Trips |
---|---|---|---|
a | b | 1 | 200 |
a | b | 4 | 500 |
a | b | 7 | 600 |
c | d | 6 | 400 |
c | d | 4 | 300 |
I need to find out for every route (Loc1 to Loc2) which month has the most trips and with the corresponding trips number .
I run some code but the output I get is as follows. How do I get the Trips column appear together.
Loc1 | Loc2 | Month |
---|---|---|
a | b | 7 |
c | d | 6 |
The code I used as below : df = pd.read_csv('data.csv') df = df[['Loc1','Loc2','Month','Trips']]
df = df.pivot_table(index = ['Loc1', 'Loc2'],
columns = 'Month',
values = 'Trips',)
df = df.idxmax(axis = 1)
df = df.reset_index()
print(f"Each route's busiest month : \n {df.to_string()}")
Upvotes: 0
Views: 190
Reputation: 260335
Try to sort by Trips in descending order and get the first row per group
df.sort_values(by='Trips', ascending=False).groupby(['Loc1', 'Loc2'], as_index=False).first()
Or:
df.sort_values(by='Trips').groupby(['Loc1', 'Loc2'], as_index=False).last()
NB. I couldn't run the code to test, but you get the general idea.
Upvotes: 2