Greg Williams
Greg Williams

Reputation: 179

python get max and min values across mutiple columns while grouping a dataframe

This query is related to How to get the max value of a multiple column group-by pandas? and well as Pandas DataFrame find the max after Groupby two columns and get counts

I am trying to create min and max values from two columns within grouped data

I have a data set in this shape

measure     measure_group     route     year      actual     budget
AC          electrification   A         20182019  103        99
AC          electrification   A         20192020  110        122
AC          electrification   B         20182019  9          10
AC          electrification   B         20192020  55         50
HV          electrification   A         20182019  2          10
HV          electrification   A         20192020  7          15
HV          electrification   B         20182019  67         10
HV          electrification   B         20192020  100        115
cat 1       track             A         20182019  10         15
cat 1       track             A         20192020  111        25
cat 1       track             B         20182019  55         16
cat 1       track             B         20192020  75         175
cat 2       track             A         20182019  84         5
cat 2       track             A         20192020  125        1005
cat 2       track             B         20182019  7          4
cat 2       track             B         20192020  15         25        

What I would like to have is as new columns the min and max of [actual, budget] for each combination of measure, measure_group, route, something like this

measure     measure_group     route     year      actual     budget  min  max
AC          electrification   A         20182019  103        99      99   122
AC          electrification   A         20192020  110        122     99   122
AC          electrification   B         20182019  9          10      9    55
AC          electrification   B         20192020  55         50      9    55
HV          electrification   A         20182019  2          10      2    15
HV          electrification   A         20192020  7          15      2    15
HV          electrification   B         20182019  67         10      10   115
HV          electrification   B         20192020  100        115     10   115
cat 1       track             A         20182019  10         15      10   111
cat 1       track             A         20192020  111        25      10   111
cat 1       track             B         20182019  55         16      16   175
cat 1       track             B         20192020  75         175     16   175
cat 2       track             A         20182019  84         5       5    1005
cat 2       track             A         20192020  125        1005    5    1005
cat 2       track             B         20182019  7          4       4    25
cat 2       track             B         20192020  15         25      4    25

I've tried various combination of df.groupby df_remapped['min'] = df_remapped.groupby(['Measure','measure_group','route'])[['Actual','Budget']].transform('min') , but this return a value error: Wrong number of items passed 2, placement implies 1 I have a sense I'm trying to return the two columns into the one new column.

I did consider generating a free-standing dataframe, and then rejoining to the original dataframe using join on the common index, but this feels like a long-winded workaround....

Any pointers to possible approaches would be appreciated. It's odd that most of the examples of aggregation are for single columns only.

Upvotes: 1

Views: 2033

Answers (1)

ALollz
ALollz

Reputation: 59579

You can melt the DataFrame so that you consider either 'actual' or 'budget' when calculating the min or max. Then group the melted DataFrame and merge back.

id_vars = ['measure', 'measure_group', 'route']

df1 = (df.melt(id_vars=id_vars, value_vars=['actual', 'budget'])
         .groupby(id_vars)['value']
         .agg(['min', 'max']))

df = df.merge(df1, how='left', on=id_vars)

   measure    measure_group route      year  actual  budget  min   max
0       AC  electrification     A  20182019     103      99   99   122
1       AC  electrification     A  20192020     110     122   99   122
2       AC  electrification     B  20182019       9      10    9    55
3       AC  electrification     B  20192020      55      50    9    55
4       HV  electrification     A  20182019       2      10    2    15
5       HV  electrification     A  20192020       7      15    2    15
6       HV  electrification     B  20182019      67      10   10   115
7       HV  electrification     B  20192020     100     115   10   115
8     cat1            track     A  20182019      10      15   10   111
9     cat1            track     A  20192020     111      25   10   111
10    cat1            track     B  20182019      55      16   16   175
11    cat1            track     B  20192020      75     175   16   175
12    cat2            track     A  20182019      84       5    5  1005
13    cat2            track     A  20192020     125    1005    5  1005
14    cat2            track     B  20182019       7       4    4    25
15    cat2            track     B  20192020      15      25    4    25

Upvotes: 1

Related Questions