Alex Cranston
Alex Cranston

Reputation: 25

Getting maximum values in a column

My dataframe looks like this:

Country Code Duration
A        1     0
A        1     1
A        1     2
A        1     3
A        2     0
A        2     1
A        1     0
A        1     1
A        1     2

I need to get max values from a "Duration" column - not just a maximum value, but a list of maximum values for each sequence of numbers in this column. The output might look like this:

Country Code Duration
  A      1     3
  A      2     1
  A      1     2

I could have grouped by "Code", but its values are often repeating, so that's probably not an option. Any help or tips would be much appreciated.

Upvotes: 1

Views: 70

Answers (4)

BENY
BENY

Reputation: 323396

Using idxmax after create another group key by diff and cumsum

df.loc[df.groupby([df.Country,df.Code.diff().ne(0).cumsum()]).Duration.idxmax()]
  Country  Code  Duration
3       A     1         3
5       A     2         1
8       A     1         2

Upvotes: 2

Erfan
Erfan

Reputation: 42946

First we create a mask to mark the sequences. Then we groupby to create the wanted output:

m = (~df['Code'].eq(df['Code'].shift())).cumsum()

df.groupby(m).agg({'Country':'first',
                   'Code':'first',
                   'Duration':'max'}).reset_index(drop=True)

  Country  Code  Duration
0       A     1         3
1       A     2         1
2       A     1         2

Upvotes: 2

Ayoub ZAROU
Ayoub ZAROU

Reputation: 2417

You might wanna check this link , it might be the answer you're looking for : pandas groupby where you get the max of one column and the min of another column . It goes as :

result = df.groupby(['Code', 'Country']).agg({'Duration':'max'})[['Duration']].reset_index()

Upvotes: 0

Zerodf
Zerodf

Reputation: 2298

The problem is slightly unclear. However, assuming that order is important, we can move toward a solution.

import pandas as pd
d = pd.read_csv('data.csv')

s = d.Code
d['series'] = s.ne(s.shift()).cumsum()
print(pd.DataFrame(d.groupby(['Country','Code','series'])['Duration'].max().reset_index()))

Returns:

 Country  Code  series  Duration
0       A     1       1         3
1       A     1       3         2
2       A     2       2         1

You can then drop the series.

Upvotes: 0

Related Questions