Reputation: 25
My dataframe looks like this:
Country Code Duration
A 1 0
A 1 1
A 1 2
A 1 3
A 2 0
A 2 1
A 1 0
A 1 1
A 1 2
I need to get max values from a "Duration" column - not just a maximum value, but a list of maximum values for each sequence of numbers in this column. The output might look like this:
Country Code Duration
A 1 3
A 2 1
A 1 2
I could have grouped by "Code", but its values are often repeating, so that's probably not an option. Any help or tips would be much appreciated.
Upvotes: 1
Views: 70
Reputation: 323396
Using idxmax
after create another group key by diff
and cumsum
df.loc[df.groupby([df.Country,df.Code.diff().ne(0).cumsum()]).Duration.idxmax()]
Country Code Duration
3 A 1 3
5 A 2 1
8 A 1 2
Upvotes: 2
Reputation: 42946
First we create a mask to mark the sequences. Then we groupby to create the wanted output:
m = (~df['Code'].eq(df['Code'].shift())).cumsum()
df.groupby(m).agg({'Country':'first',
'Code':'first',
'Duration':'max'}).reset_index(drop=True)
Country Code Duration
0 A 1 3
1 A 2 1
2 A 1 2
Upvotes: 2
Reputation: 2417
You might wanna check this link , it might be the answer you're looking for : pandas groupby where you get the max of one column and the min of another column . It goes as :
result = df.groupby(['Code', 'Country']).agg({'Duration':'max'})[['Duration']].reset_index()
Upvotes: 0
Reputation: 2298
The problem is slightly unclear. However, assuming that order is important, we can move toward a solution.
import pandas as pd
d = pd.read_csv('data.csv')
s = d.Code
d['series'] = s.ne(s.shift()).cumsum()
print(pd.DataFrame(d.groupby(['Country','Code','series'])['Duration'].max().reset_index()))
Returns:
Country Code series Duration
0 A 1 1 3
1 A 1 3 2
2 A 2 2 1
You can then drop the series.
Upvotes: 0