Reputation: 179
This query is related to How to get the max value of a multiple column group-by pandas? and well as Pandas DataFrame find the max after Groupby two columns and get counts
I am trying to create min and max values from two columns within grouped data
I have a data set in this shape
measure measure_group route year actual budget
AC electrification A 20182019 103 99
AC electrification A 20192020 110 122
AC electrification B 20182019 9 10
AC electrification B 20192020 55 50
HV electrification A 20182019 2 10
HV electrification A 20192020 7 15
HV electrification B 20182019 67 10
HV electrification B 20192020 100 115
cat 1 track A 20182019 10 15
cat 1 track A 20192020 111 25
cat 1 track B 20182019 55 16
cat 1 track B 20192020 75 175
cat 2 track A 20182019 84 5
cat 2 track A 20192020 125 1005
cat 2 track B 20182019 7 4
cat 2 track B 20192020 15 25
What I would like to have is as new columns the min and max of [actual, budget] for each combination of measure, measure_group, route, something like this
measure measure_group route year actual budget min max
AC electrification A 20182019 103 99 99 122
AC electrification A 20192020 110 122 99 122
AC electrification B 20182019 9 10 9 55
AC electrification B 20192020 55 50 9 55
HV electrification A 20182019 2 10 2 15
HV electrification A 20192020 7 15 2 15
HV electrification B 20182019 67 10 10 115
HV electrification B 20192020 100 115 10 115
cat 1 track A 20182019 10 15 10 111
cat 1 track A 20192020 111 25 10 111
cat 1 track B 20182019 55 16 16 175
cat 1 track B 20192020 75 175 16 175
cat 2 track A 20182019 84 5 5 1005
cat 2 track A 20192020 125 1005 5 1005
cat 2 track B 20182019 7 4 4 25
cat 2 track B 20192020 15 25 4 25
I've tried various combination of df.groupby df_remapped['min'] = df_remapped.groupby(['Measure','measure_group','route'])[['Actual','Budget']].transform('min')
, but this return a value error: Wrong number of items passed 2, placement implies 1
I have a sense I'm trying to return the two columns into the one new column.
I did consider generating a free-standing dataframe, and then rejoining to the original dataframe using join on the common index, but this feels like a long-winded workaround....
Any pointers to possible approaches would be appreciated. It's odd that most of the examples of aggregation are for single columns only.
Upvotes: 1
Views: 2033
Reputation: 59579
You can melt
the DataFrame so that you consider either 'actual' or 'budget' when calculating the min or max. Then group the melted DataFrame and merge back.
id_vars = ['measure', 'measure_group', 'route']
df1 = (df.melt(id_vars=id_vars, value_vars=['actual', 'budget'])
.groupby(id_vars)['value']
.agg(['min', 'max']))
df = df.merge(df1, how='left', on=id_vars)
measure measure_group route year actual budget min max
0 AC electrification A 20182019 103 99 99 122
1 AC electrification A 20192020 110 122 99 122
2 AC electrification B 20182019 9 10 9 55
3 AC electrification B 20192020 55 50 9 55
4 HV electrification A 20182019 2 10 2 15
5 HV electrification A 20192020 7 15 2 15
6 HV electrification B 20182019 67 10 10 115
7 HV electrification B 20192020 100 115 10 115
8 cat1 track A 20182019 10 15 10 111
9 cat1 track A 20192020 111 25 10 111
10 cat1 track B 20182019 55 16 16 175
11 cat1 track B 20192020 75 175 16 175
12 cat2 track A 20182019 84 5 5 1005
13 cat2 track A 20192020 125 1005 5 1005
14 cat2 track B 20182019 7 4 4 25
15 cat2 track B 20192020 15 25 4 25
Upvotes: 1