arpita
arpita

Reputation: 51

To perform Calculations based on condition without specifying column values in python

I have a csv file, in which i need to do some operations on column without specifying the column values .

The input csv(df) is as follows:

weather speed   type    cal_A   cal_B
 good   0-3     cold    12       10
 good   0-3     cold    21       7
 good   0-3     cold    31       5
 good   0-3     cold    17       1
 good   3-5     cold    19       17
 bad    0-3     hot     15       4
 bad    6-9     hot     21       13
 bad    6-9     hot     15       7
 bad    6-9     cold    21       4
 rainy  0-3     cold    14       7
 rainy  5-8     cold    21       10
 rainy  5-8     cold    2        3
 rainy  5-8     cold    18       16

In this csv i need to divide the column named cal_A ,cal_B by grouping on column weather ,type and speed and then find the minimum ,maximum, and mean and put them as separate columns.

The minimum , maximum and mean is calculated after the division of columns cal_A and cal_B.

The output file is as follows:

weather speed   type    cal_A/cal_B(min)        cal_A/cal_B(max)    cal_A/cal_B(mean)
good    0-3     cold    1.2                        17
good    3-5     cold    1.11                       1.11
bad     0-3     hot     3.75                       3.75
bad     6-9     hot     1.61                       2.14
bad     6-9     cold    5.25                       5.25
rainy   0-3     cold    2                          2
rainy   5-8     cold    0.6                        2.1

The code which i have tried is as follows:

df=df.groupby(['weather','speed','type'],as_index=False).min().eval('cal_A/cal_B(min)=cal_A/cal_B') df=df.groupby(['weather','speed','type'],as_index=False).max().eval('cal_A/cal_B(max)=cal_A/cal_B')

The above code groups the columns weather, speed and type then provides the minimum and value from the calculation but this code is not providing me the expected output.

Upvotes: 0

Views: 163

Answers (2)

ALollz
ALollz

Reputation: 59519

You would perform the division first then group and aggregate that Series.

(df.cal_A/df.cal_B).groupby([df.weather, df.speed, df.type], sort=False).agg(['min', 'max', 'mean'])

If you want to exactly reproduce your output we can use the add_prefix/suffix methods (though probably more efficient to just rename the columns object).

((df.cal_A/df.cal_B).groupby([df.weather, df.speed, df.type], sort=False)
   .agg(['min', 'max', 'mean'])
   .add_prefix('cal_A/cal_B(')
   .add_suffix(')')
   .reset_index())

  weather speed  type  cal_A/cal_B(min)  cal_A/cal_B(max)  cal_A/cal_B(mean)
0    good   0-3  cold          1.200000         17.000000           6.850000
1    good   3-5  cold          1.117647          1.117647           1.117647
2     bad   0-3   hot          3.750000          3.750000           3.750000
3     bad   6-9   hot          1.615385          2.142857           1.879121
4     bad   6-9  cold          5.250000          5.250000           5.250000
5   rainy   0-3  cold          2.000000          2.000000           2.000000
6   rainy   5-8  cold          0.666667          2.100000           1.297222

Upvotes: 3

Celius Stingher
Celius Stingher

Reputation: 18367

You can use NamedAgg to solve the issue if working with pandas 0.25+:

import pandas as pd
import numpy as np 
data = {'weather':['good','good','good','good','good','bad','bad','bad','bad','rainy','rainy','rainy','rainy'],'speed':['0-3','0-3','0-3','0-3','3-5','0-3','6-9','6-9','6-9','0-3','5-8','5-8','5-8'],'type':['cold','cold','cold','cold','cold','hot','hot','hot','cold','cold','cold','cold','cold'],'cal_A':[12,21,31,17,19,15,21,15,21,14,21,2,18],'cal_B':[10,7,5,1,17,4,13,7,4,7,10,3,16]}
df = pd.DataFrame(data)
df['divided'] = df['cal_A']/df['cal_B']
output = df.groupby(['weather','speed','type']).agg(
    minimum=pd.NamedAgg(column='divided',aggfunc='min'),
    maximum=pd.NamedAgg(column='divided',aggfunc='max'),
    mean=pd.NamedAgg(column='divided',aggfunc='mean'))
print(output)

output:

                     minimum    maximum      mean
weather speed type
bad     0-3   hot   3.750000   3.750000  3.750000
        6-9   cold  5.250000   5.250000  5.250000
              hot   1.615385   2.142857  1.879121
good    0-3   cold  1.200000  17.000000  6.850000
        3-5   cold  1.117647   1.117647  1.117647
rainy   0-3   cold  2.000000   2.000000  2.000000
        5-8   cold  0.666667   2.100000  1.297222

Upvotes: 2

Related Questions