Cleb
Cleb

Reputation: 26069

Groupby column name and add results as additional columns

I have a dataframe like this:

import pandas as pd

df = pd.DataFrame({
    'stuff_1_var_1': range(5),
    'stuff_1_var_2': range(2, 7),
    'stuff_2_var_1': range(3, 8),
    'stuff_2_var_2': range(5, 10)
})

   stuff_1_var_1  stuff_1_var_2  stuff_2_var_1  stuff_2_var_2
0              0              2              3              5
1              1              3              4              6

I would like to groupby based on the column headers and then add the mean and median of each group as new columns. So my expected output looks like this:

   stuff_1_var_mean  stuff_1_var_median  stuff_2_var_mean  stuff_2_var_median
0                 1                   1                 4                   4
1                 2                   2                 5                   5

Brief explanation: we have two groups stuff_1_var_ and stuff_2_var_ for which would calculate the mean and median per row. So, e.g. for stuff_1_var_ it would be:

# values from stuff_1_var_1 and stuff_1_var_2
(0 + 2) / 2 = 1 and 
( 1 + 3) / 2 = 2

The values are then added as a new column stuff_1_var_mean; analogue for meadian and stuff_2_var_.

I got until:

df = df.T

pattern = df.index.str.extract('(^stuff_\d_var_)', expand=False)

dfgb = df.groupby(pattern).agg(['mean', 'median']).T

          stuff_1_var_  stuff_2_var_
0 mean               1             4
  median             1             4
1 mean               2             5
  median             2             5

How can I do the final step(s)?

Upvotes: 0

Views: 266

Answers (2)

jezrael
jezrael

Reputation: 863791

Your solution should be changed:

df = df.T

pattern = df.index.str.extract('(^stuff_\d_var_)', expand=False)

dfgb = df.groupby(pattern).agg(['mean', 'median']).T.unstack()
dfgb.columns = dfgb.columns.map(lambda x: f'{x[0]}{x[1]}')

print (dfgb)
   stuff_1_var_mean  stuff_1_var_median  stuff_2_var_mean  stuff_2_var_median
0                 1                   1                 4                   4
1                 2                   2                 5                   5
2                 3                   3                 6                   6
3                 4                   4                 7                   7
4                 5                   5                 8                   8

Unfortunately for axis=1 is not implemented agg, so possible solution is create mean and median separately and then concat:

dfgb = df.groupby(pattern, axis=1).agg(['mean','median'])

NotImplementedError: axis other than 0 is not supported

pattern = df.columns.str.extract('(^stuff_\d_var_)', expand=False)
g = df.groupby(pattern, axis=1)

dfgb = pd.concat([g.mean().add_suffix('mean'), 
                  g.median().add_suffix('median')], axis=1)
dfgb = dfgb.iloc[:, [0,2,1,3]]
print (dfgb)
   stuff_1_var_mean  stuff_1_var_median  stuff_2_var_mean  stuff_2_var_median
0                 1                   1                 4                   4
1                 2                   2                 5                   5
2                 3                   3                 6                   6
3                 4                   4                 7                   7
4                 5                   5                 8                   8

Upvotes: 1

YOLO
YOLO

Reputation: 21759

Here's a way you can do:

col = 'stuff_1_var_'
use_col = [x for x in df.columns if 'stuff_1' in x]

df[f'{col}mean'] = df[use_col].mean(1)
df[f'{col}median'] = df[use_col].median(1)

col2 = 'stuff_2_var_'
use_col = [x for x in df.columns if 'stuff_2' in x]

df[f'{col2}mean'] = df[use_col].mean(1)
df[f'{col2}median'] = df[use_col].median(1)

print(df.iloc[:,-4:]) # showing last four new columns

  stuff_1_var_mean  stuff_1_var_median  stuff_2_var_mean  stuff_2_var_median  
0               1.0                 1.0               4.0                 4.0  
1               2.0                 2.0               5.0                 5.0  
2               3.0                 3.0               6.0                 6.0  
3               4.0                 4.0               7.0                 7.0  
4               5.0                 5.0               8.0                 8.0  

Ofcourse, you can put it in a function to avoid repeating the same code.

Upvotes: 1

Related Questions