MikG
MikG

Reputation: 1019

Pandas DataFrame manipulation based on Column value

Suppose I have a Pandas Dataframe in the format below based on a 'Dict of Dicts 'Group'list (also below)…

ITEMS={
    “Item_group1”:{‘Stuff’:’Some stuf’
            ‘More Stuff’:’Extra Stuff’
            Group:[[Iteration1, 18, 25,0], [Iteration1, 43, 67,1], [Iteration1, 87, 76,1],
                [Iteration2, 45, 29,0], [Iteration2, 44, 77,1], [Iteration2, 43, 74,0]],

            }
    “Item_group2”:{‘Stuff’:’Some stuf’
            ‘More Stuff’:’Extra Stuff’
            Group:[[Iteration1, 75, 564,0], [Iteration1, 21, 87,1], [Iteration1, 7, 5,1],
                [Iteration2, 54, 24,0], [Iteration2, 7, 45,1], [Iteration2, 45, 745,0]],
            }

DataFrame in the following format….

Iteration   Value1  Value2  Feature Active
Iteration1  18      25      0
Iteration1  3       67      1
Iteration1  87      76      1
Iteration2  45      29      0
Iteration2  44      7       1
Iteration2  43      74      0

How would I go about separating and calculating the averages for each iteration based on ‘Feature Active’ == 1, and ignore any ‘Feature Active’ ==0 entries?

I have the following code to calculate the stats per iteration for Value1 and Value2 after separating for ‘Iteration’ and ‘Feature Active’ as the keys, but it displays ‘Feature Active’ ==0 which I am not concerned with.

FeatureAvgs = Item_group1_DF.groupby(['Iteration’,’Feature Active'])
print np.round(FeatureAvgs[['Value1','Value2']].describe(), decimals=1)

Which produces the following output… (ignore the actual figures, this was taken from another dataframe)

Iteration   Feature Enabled
Iteration1  0               count   3672.0   3672.0
                            mean   -1352.5      0.0
                            std      220.5      0.0
                            min    -1920.0      0.0
                            25%    -1507.2      0.0
                            50%    -1267.0      0.0
                            75%    -1184.0      0.0
                            max     -785.0      0.0
            1               count    580.0    580.0
                            mean   -1368.6  -1394.5
                            std      151.5    157.7
                            min    -1788.0  -1805.0
                            25%    -1454.2  -1490.2
                            50%    -1335.5  -1361.0
                            75%    -1270.0  -1291.0
                            max    -1045.0  -1033.0
Iteration2  0               count  20612.0  20612.0
                            mean   -1073.5      0.0
                            std      142.3      0.0
                            min    -1730.0      0.0
                            25%    -1088.0      0.0
                            50%    -1036.0      0.0
                            75%    -1005.0      0.0
                            max     -805.0      0.0
            1               count  14718.0  14718.0
                            mean   -1113.6  -1161.1
                            std      129.3    134.9
                            min    -1773.0  -1818.0
                            25%    -1151.0  -1214.0
                            50%    -1095.0  -1122.0
                            75%    -1043.0  -1075.0
                            max     -832.0   -897.0

But I am just after the mean averages for when the feature is active (==1). Sorry for the long question, but I am new to Pandas, and still getting through the documentation

Upvotes: 2

Views: 143

Answers (2)

jrjc
jrjc

Reputation: 21873

If I understood well, you can do:

> df.groupby(["Feature Active", "Iteration"]).mean().loc[1]

            Value1  Value2
Iteration                 
Iteration1      45    71.5
Iteration2      44     7.0

You groupby first on the feature, and second on the Iteration variable. On each group you apply the mean() function, and you get the group whose index is 1, which correspond to the Feature Active == 1 group.

With:

> df

    Iteration  Value1  Value2  Feature Active
0  Iteration1      18      25               0
1  Iteration1       3      67               1
2  Iteration1      87      76               1
3  Iteration2      45      29               0
4  Iteration2      44       7               1
5  Iteration2      43      74               0


> df.groupby(["Feature Active", "Iteration"]).mean()

                           Value1  Value2
Feature Active Iteration                 
0              Iteration1      18    25.0
               Iteration2      44    51.5
1              Iteration1      45    71.5
               Iteration2      44     7.0

Tell me if it's not what you wanted.

HTH

Upvotes: 0

EdChum
EdChum

Reputation: 393963

Rather than filtering on the groupby object you can just filter the initial df first:

FeatureAvgs = Item_group1_DF[item_group1_DF['Feature Enabled'] == 1].groupby(['Iteration’,’Feature Active'])[['Value1','Value2']].mean()

Also there is no need to use describe if you just want the mean just use mean, as an aside you can access the mean column from the result of describe by using:

print np.round(FeatureAvgs[['Value1','Value2']].describe()['mean'], decimals=1)

Upvotes: 1

Related Questions