To perform Calculations based on condition without specifying column values in python

Question

I have a csv file, in which i need to do some operations on column without specifying the column values .

The input csv(df) is as follows:

weather speed   type    cal_A   cal_B
 good   0-3     cold    12       10
 good   0-3     cold    21       7
 good   0-3     cold    31       5
 good   0-3     cold    17       1
 good   3-5     cold    19       17
 bad    0-3     hot     15       4
 bad    6-9     hot     21       13
 bad    6-9     hot     15       7
 bad    6-9     cold    21       4
 rainy  0-3     cold    14       7
 rainy  5-8     cold    21       10
 rainy  5-8     cold    2        3
 rainy  5-8     cold    18       16

In this csv i need to divide the column named cal_A ,cal_B by grouping on column weather ,type and speed and then find the minimum ,maximum, and mean and put them as separate columns.

The minimum , maximum and mean is calculated after the division of columns cal_A and cal_B.

The output file is as follows:

weather speed   type    cal_A/cal_B(min)        cal_A/cal_B(max)    cal_A/cal_B(mean)
good    0-3     cold    1.2                        17
good    3-5     cold    1.11                       1.11
bad     0-3     hot     3.75                       3.75
bad     6-9     hot     1.61                       2.14
bad     6-9     cold    5.25                       5.25
rainy   0-3     cold    2                          2
rainy   5-8     cold    0.6                        2.1

The code which i have tried is as follows:

df=df.groupby(['weather','speed','type'],as_index=False).min().eval('cal_A/cal_B(min)=cal_A/cal_B') df=df.groupby(['weather','speed','type'],as_index=False).max().eval('cal_A/cal_B(max)=cal_A/cal_B')

The above code groups the columns weather, speed and type then provides the minimum and value from the calculation but this code is not providing me the expected output.

ALollz · Accepted Answer

You would perform the division first then group and aggregate that Series.

(df.cal_A/df.cal_B).groupby([df.weather, df.speed, df.type], sort=False).agg(['min', 'max', 'mean'])

If you want to exactly reproduce your output we can use the add_prefix/suffix methods (though probably more efficient to just rename the columns object).

((df.cal_A/df.cal_B).groupby([df.weather, df.speed, df.type], sort=False)
   .agg(['min', 'max', 'mean'])
   .add_prefix('cal_A/cal_B(')
   .add_suffix(')')
   .reset_index())

  weather speed  type  cal_A/cal_B(min)  cal_A/cal_B(max)  cal_A/cal_B(mean)
0    good   0-3  cold          1.200000         17.000000           6.850000
1    good   3-5  cold          1.117647          1.117647           1.117647
2     bad   0-3   hot          3.750000          3.750000           3.750000
3     bad   6-9   hot          1.615385          2.142857           1.879121
4     bad   6-9  cold          5.250000          5.250000           5.250000
5   rainy   0-3  cold          2.000000          2.000000           2.000000
6   rainy   5-8  cold          0.666667          2.100000           1.297222

To perform Calculations based on condition without specifying column values in python

Answers (2)

Related Questions