equanimity
equanimity

Reputation: 2533

Filtering a Pandas pivot table *during* the pivot

Let's assume we have the following data frame df:

df = pd.DataFrame({'food' : ['spam', 'ham', 'eggs', 'ham', 'ham', 'eggs', 'milk'],
                   'sales' : [10, 15, 12, 5, 14, 3, 8]})

I'd like to pivot this data to show the sum of sales by food, but only if sales is greater than 12. The resulting pivot table would look as follows:

Unfiltered df:

food       sum(sales)
spam       10
ham        34
eggs       15
milk       8

Filtered df:

food       sum(sales)
ham        34
eggs       15

I can use groupby() as follows:

df_new.groupby(['food'])['sales'].agg('sum') > 12

But, this only gives me the boolean and not the filtered df.

Is this possible to filter a column "on the fly" when using the pd.pivot_table() function? (i.e. without pre-filtering the df)

Upvotes: 2

Views: 938

Answers (2)

Alison
Alison

Reputation: 21

groupby produces a series object. It's not pretty, but you can subset it dynamically using:

df.groupby(['food'])['sales'].agg('sum')[df.groupby(['food'])['sales'].agg('sum')>12]

Upvotes: 1

user17242583
user17242583

Reputation:

You can pass a lambda function .loc which will filter the dataframe for only rows that match the condition that the lambda function returns:

filtered = df.groupby('food')['sales'].sum().reset_index().loc[lambda x: x['sales'] > 12]

Output:

>>> filtered
   food  sales
0  eggs     15
1   ham     34

(In case you're wondering, the lambda function gets executed for the whole dataframe, not for each individual row, so yes, it's very efficient :)

Upvotes: 4

Related Questions