Reputation: 4158
For every column in the list cat_column, I need to loop over the list numerical_cols and get the mean and standard deviation. I have the code below that does it. But in the end of second loop, I need to have a final table with the respective cat_column and mean and standard deviation of the all numerical columns like below.
Code1 Code2 Mean_Code1_CarAge Std_Code1_CarAge Mean_Code1_CarPrice Std_Code1_CarPrice Mean_Code2_CarAge Std_Code2_CarAge Mean_Code2_CarPrice Std_Code2_CarPrice
Code:
cat_column = ["Code1", "Code2"]
numerical_cols = ['CarAge', 'CarPrice']
for base_col in cat_column :
for col in numerical_cols:
df = df.groupby(base_col)[col].agg([np.mean, np.std]).reset_index().rename(
columns={'mean': 'mean_'+base_col+"_"+col, 'std': 'std_'+base_col+"_"+col})
Input:
Code1 Code2 CarAge CarPrice
AAA AA1 12 5000
BBB BB1 30 10000
CCC CC1 64 22000
AAA AA1 19 4000
BBB BB1 49 10000
Output:
Code1 Code2 Mean_Code1_CarAge Std_Code1_CarAge Mean_Code1_CarPrice Std_Code1_CarPrice Mean_Code2_CarAge Std_Code2_CarAge Mean_Code2_CarPrice Std_Code2_CarPrice
AAA AA1 15.5 4.95 4500 707.10 15.5 4.95 4500 707.10
BBB BB1 9.5 13.43 10000 0.00 9.5 13.43 10000 0.00
CCC CC1 64.0 NaN 22000 NaN 64.0 NaN 22000 NaN
Not sure how to do that dynamically in the above code. Any leads/suggestions would be appreciated.
Upvotes: 2
Views: 58
Reputation: 35646
Try groupby aggregate using a dictionary made from the values in numerical_cols
then reduce the multi-index using map, lastly concat
on axis=1:
import pandas as pd
df = pd.DataFrame({
'Code1': {0: 'AAA', 1: 'BBB', 2: 'CCC', 3: 'AAA', 4: 'BBB'},
'Code2': {0: 'AA1', 1: 'BB1', 2: 'CC1', 3: 'AA1', 4: 'BB1'},
'CarAge': {0: 12, 1: 30, 2: 64, 3: 19, 4: 49},
'CarPrice': {0: 5000, 1: 10000, 2: 22000, 3: 4000, 4: 10000}}
)
cat_columns = ["Code1", "Code2"]
numerical_cols = ['CarAge', 'CarPrice']
# Create a dictionary to map keys to aggregation types
agg_d = {k: ['mean', 'std'] for k in numerical_cols}
dfs = []
for cat_column in cat_columns:
# Groupby Agg to get aggs for each key in agg_d per group
g = df.groupby(cat_column).aggregate(agg_d)
# Reduce Multi Index
g.columns = g.columns.map(lambda x: f'{x[1]}_{cat_column}_{x[0]}')
# Reset Index
g = g.reset_index()
dfs.append(g)
# Concat on Axis 1
new_df = pd.concat(dfs, axis=1)
# Re Order Columns
new_df = new_df[[*cat_columns, *new_df.columns.difference(cat_columns)]]
print(new_df.to_string())
new_df
:
Code1 Code2 mean_Code1_CarAge mean_Code1_CarPrice mean_Code2_CarAge mean_Code2_CarPrice std_Code1_CarAge std_Code1_CarPrice std_Code2_CarAge std_Code2_CarPrice
0 AAA AA1 15.5 4500 15.5 4500 4.949747 707.106781 4.949747 707.106781
1 BBB BB1 39.5 10000 39.5 10000 13.435029 0.000000 13.435029 0.000000
2 CCC CC1 64.0 22000 64.0 22000 NaN NaN NaN NaN
Upvotes: 2