Reputation: 1019
Suppose I have a Pandas Dataframe in the format below based on a 'Dict of Dicts 'Group'list (also below)…
ITEMS={
“Item_group1”:{‘Stuff’:’Some stuf’
‘More Stuff’:’Extra Stuff’
Group:[[Iteration1, 18, 25,0], [Iteration1, 43, 67,1], [Iteration1, 87, 76,1],
[Iteration2, 45, 29,0], [Iteration2, 44, 77,1], [Iteration2, 43, 74,0]],
}
“Item_group2”:{‘Stuff’:’Some stuf’
‘More Stuff’:’Extra Stuff’
Group:[[Iteration1, 75, 564,0], [Iteration1, 21, 87,1], [Iteration1, 7, 5,1],
[Iteration2, 54, 24,0], [Iteration2, 7, 45,1], [Iteration2, 45, 745,0]],
}
DataFrame in the following format….
Iteration Value1 Value2 Feature Active
Iteration1 18 25 0
Iteration1 3 67 1
Iteration1 87 76 1
Iteration2 45 29 0
Iteration2 44 7 1
Iteration2 43 74 0
How would I go about separating and calculating the averages for each iteration based on ‘Feature Active’ == 1, and ignore any ‘Feature Active’ ==0 entries?
I have the following code to calculate the stats per iteration for Value1 and Value2 after separating for ‘Iteration’ and ‘Feature Active’ as the keys, but it displays ‘Feature Active’ ==0 which I am not concerned with.
FeatureAvgs = Item_group1_DF.groupby(['Iteration’,’Feature Active'])
print np.round(FeatureAvgs[['Value1','Value2']].describe(), decimals=1)
Which produces the following output… (ignore the actual figures, this was taken from another dataframe)
Iteration Feature Enabled
Iteration1 0 count 3672.0 3672.0
mean -1352.5 0.0
std 220.5 0.0
min -1920.0 0.0
25% -1507.2 0.0
50% -1267.0 0.0
75% -1184.0 0.0
max -785.0 0.0
1 count 580.0 580.0
mean -1368.6 -1394.5
std 151.5 157.7
min -1788.0 -1805.0
25% -1454.2 -1490.2
50% -1335.5 -1361.0
75% -1270.0 -1291.0
max -1045.0 -1033.0
Iteration2 0 count 20612.0 20612.0
mean -1073.5 0.0
std 142.3 0.0
min -1730.0 0.0
25% -1088.0 0.0
50% -1036.0 0.0
75% -1005.0 0.0
max -805.0 0.0
1 count 14718.0 14718.0
mean -1113.6 -1161.1
std 129.3 134.9
min -1773.0 -1818.0
25% -1151.0 -1214.0
50% -1095.0 -1122.0
75% -1043.0 -1075.0
max -832.0 -897.0
But I am just after the mean averages for when the feature is active (==1). Sorry for the long question, but I am new to Pandas, and still getting through the documentation
Upvotes: 2
Views: 143
Reputation: 21873
If I understood well, you can do:
> df.groupby(["Feature Active", "Iteration"]).mean().loc[1]
Value1 Value2
Iteration
Iteration1 45 71.5
Iteration2 44 7.0
You groupby
first on the feature, and second on the Iteration variable. On each group you apply the mean()
function, and you get the group whose index is 1
, which correspond to the Feature Active == 1
group.
With:
> df
Iteration Value1 Value2 Feature Active
0 Iteration1 18 25 0
1 Iteration1 3 67 1
2 Iteration1 87 76 1
3 Iteration2 45 29 0
4 Iteration2 44 7 1
5 Iteration2 43 74 0
> df.groupby(["Feature Active", "Iteration"]).mean()
Value1 Value2
Feature Active Iteration
0 Iteration1 18 25.0
Iteration2 44 51.5
1 Iteration1 45 71.5
Iteration2 44 7.0
Tell me if it's not what you wanted.
HTH
Upvotes: 0
Reputation: 393963
Rather than filtering on the groupby object you can just filter the initial df first:
FeatureAvgs = Item_group1_DF[item_group1_DF['Feature Enabled'] == 1].groupby(['Iteration’,’Feature Active'])[['Value1','Value2']].mean()
Also there is no need to use describe
if you just want the mean
just use mean
, as an aside you can access the mean
column from the result of describe
by using:
print np.round(FeatureAvgs[['Value1','Value2']].describe()['mean'], decimals=1)
Upvotes: 1