Reputation: 51
I have a csv file, in which i need to do some operations on column without specifying the column values .
The input csv(df) is as follows:
weather speed type cal_A cal_B
good 0-3 cold 12 10
good 0-3 cold 21 7
good 0-3 cold 31 5
good 0-3 cold 17 1
good 3-5 cold 19 17
bad 0-3 hot 15 4
bad 6-9 hot 21 13
bad 6-9 hot 15 7
bad 6-9 cold 21 4
rainy 0-3 cold 14 7
rainy 5-8 cold 21 10
rainy 5-8 cold 2 3
rainy 5-8 cold 18 16
In this csv i need to divide the column named cal_A ,cal_B by grouping on column weather ,type and speed and then find the minimum ,maximum, and mean and put them as separate columns.
The minimum , maximum and mean is calculated after the division of columns cal_A and cal_B.
The output file is as follows:
weather speed type cal_A/cal_B(min) cal_A/cal_B(max) cal_A/cal_B(mean)
good 0-3 cold 1.2 17
good 3-5 cold 1.11 1.11
bad 0-3 hot 3.75 3.75
bad 6-9 hot 1.61 2.14
bad 6-9 cold 5.25 5.25
rainy 0-3 cold 2 2
rainy 5-8 cold 0.6 2.1
The code which i have tried is as follows:
df=df.groupby(['weather','speed','type'],as_index=False).min().eval('cal_A/cal_B(min)=cal_A/cal_B')
df=df.groupby(['weather','speed','type'],as_index=False).max().eval('cal_A/cal_B(max)=cal_A/cal_B')
The above code groups the columns weather, speed and type then provides the minimum and value from the calculation but this code is not providing me the expected output.
Upvotes: 0
Views: 163
Reputation: 59519
You would perform the division first then group and aggregate that Series.
(df.cal_A/df.cal_B).groupby([df.weather, df.speed, df.type], sort=False).agg(['min', 'max', 'mean'])
If you want to exactly reproduce your output we can use the add_prefix/suffix
methods (though probably more efficient to just rename the columns object).
((df.cal_A/df.cal_B).groupby([df.weather, df.speed, df.type], sort=False)
.agg(['min', 'max', 'mean'])
.add_prefix('cal_A/cal_B(')
.add_suffix(')')
.reset_index())
weather speed type cal_A/cal_B(min) cal_A/cal_B(max) cal_A/cal_B(mean)
0 good 0-3 cold 1.200000 17.000000 6.850000
1 good 3-5 cold 1.117647 1.117647 1.117647
2 bad 0-3 hot 3.750000 3.750000 3.750000
3 bad 6-9 hot 1.615385 2.142857 1.879121
4 bad 6-9 cold 5.250000 5.250000 5.250000
5 rainy 0-3 cold 2.000000 2.000000 2.000000
6 rainy 5-8 cold 0.666667 2.100000 1.297222
Upvotes: 3
Reputation: 18367
You can use NamedAgg
to solve the issue if working with pandas 0.25+:
import pandas as pd
import numpy as np
data = {'weather':['good','good','good','good','good','bad','bad','bad','bad','rainy','rainy','rainy','rainy'],'speed':['0-3','0-3','0-3','0-3','3-5','0-3','6-9','6-9','6-9','0-3','5-8','5-8','5-8'],'type':['cold','cold','cold','cold','cold','hot','hot','hot','cold','cold','cold','cold','cold'],'cal_A':[12,21,31,17,19,15,21,15,21,14,21,2,18],'cal_B':[10,7,5,1,17,4,13,7,4,7,10,3,16]}
df = pd.DataFrame(data)
df['divided'] = df['cal_A']/df['cal_B']
output = df.groupby(['weather','speed','type']).agg(
minimum=pd.NamedAgg(column='divided',aggfunc='min'),
maximum=pd.NamedAgg(column='divided',aggfunc='max'),
mean=pd.NamedAgg(column='divided',aggfunc='mean'))
print(output)
output:
minimum maximum mean
weather speed type
bad 0-3 hot 3.750000 3.750000 3.750000
6-9 cold 5.250000 5.250000 5.250000
hot 1.615385 2.142857 1.879121
good 0-3 cold 1.200000 17.000000 6.850000
3-5 cold 1.117647 1.117647 1.117647
rainy 0-3 cold 2.000000 2.000000 2.000000
5-8 cold 0.666667 2.100000 1.297222
Upvotes: 2