Reputation: 1973
I have a dataframe in Pandas like below -
import pandas as pd
data = {'Category': ['cat2','cat1','cat2','cat1','cat2','cat1','cat2','cat1','cat1','cat1','cat2'],
'values': [1,2,3,1,2,3,1,2,3,5,1]}
my_data = pd.DataFrame(data)
I get the minimum, maximum and average values per category values and I also rename the column names as shown below -
# Get the minimum value for each column
min_agg = my_data.groupby('Category').min().reset_index()
min_agg.columns = [str(col) + '_min' for col in min_agg.columns]
category_min values_min
0 cat1 1
1 cat2 1
# Get the minimum value for each column
max_agg = my_data.groupby('Category').max().reset_index()
max_agg.columns = [str(col) + '_max' for col in max_agg.columns]
category_max values_max
0 cat1 5
1 cat2 3
# Get average value for each column
avg_agg = my_data.groupby('Category').mean().reset_index()
avg_agg.columns = [str(col) + '_avg' for col in avg_agg.columns]
category_avg values_avg
0 cat1 2.666667
1 cat2 1.600000
# count number of zeros
zeros_agg = my_data.groupby('Category')['values'].apply(lambda column: (column == 0).sum()).reset_index()
zeros_agg.columns = [str(col) + '_zeros' for col in zeros_agg.columns]
Category_zeros values_zeros
0 cat1 0
1 cat2 0
Now, I want to vertically stack these 4 dataframes to get a final dataframe that has 8 records, 2 each per dataframe which looks like this -
category values
cat1_min 1
cat2_min 1
cat1_max 5
cat2_max 3
cat1_avg 2.666667
cat2_avg 1.600000
cat1_zeros 0
cat2_zeros 0
The first column in the output shows what aggregation is applied on which category and second column shows the corresponding value.
How do I do this using pandas ?
I tried
vertical_stack = pd.concat([min_agg, max_agg, avg_agg,zeros_agg], axis=0 , keys=['Category_min','Category_max','Category_avg','Category_zeros'])
but it did not give me expected output.
The aggregation shown here is only on 1 column but I have a bigger dataset and I am computing this aggregations on many columns.
Upvotes: 3
Views: 461
Reputation: 323226
I will take advantage of describe
s=df.groupby('Category')['values'].describe()[['mean','max','min']].stack()
s.index=s.index.map('_'.join)
s
cat1_mean 2.666667
cat1_max 5.000000
cat1_min 1.000000
cat2_mean 1.600000
cat2_max 3.000000
cat2_min 1.000000
dtype: float64
Upvotes: 1
Reputation: 402363
You can pivot on category, with your desired aggregation functions, then stack the result:
r = df.pivot_table(
columns='Category', values='values', aggfunc=['min', 'max', 'mean'])
r.columns = r.columns.map('_'.join)
r.T
values
min_cat1 1.000000
min_cat2 1.000000
max_cat1 5.000000
max_cat2 3.000000
mean_cat1 2.666667
mean_cat2 1.600000
This is similar to
r = df.groupby('Category')['values'].agg(['min', 'max', 'mean']).stack()
r.index = r.index.map('_'.join)
r
cat1_min 1.000000
cat1_max 5.000000
cat1_mean 2.666667
cat2_min 1.000000
cat2_max 3.000000
cat2_mean 1.600000
dtype: float64
Upvotes: 2