Reputation: 913
I have this df:
group owner failed granted_pe slots
0 g1 u1 0 single 1
1 g50 u92 0 shared 8
2 g50 u92 0 shared 1
df
can be created using this code:
df = pd.DataFrame([['g1', 'u1', 0, 'single', 1],
['g50', 'u92', '0', 'shared', '8'],
['g50', 'u92', '0', 'shared', '1']],
columns=['group', 'owner', 'failed','granted_pe', 'slots'])
df = (df.astype(dtype={'group':'str', 'owner':'str','failed':'int', 'granted_pe':'str', 'slots':'int'}))
print(df)
Using groupby I create three columns calculated on the "slots" column:
df_calculated = pd.concat([
df.loc[:,['group', 'slots']].groupby(['group']).sum(),
df.loc[:,['group', 'slots']].groupby(['group']).mean(),
df.loc[:,['group', 'slots']].groupby(['group']).max()
], axis=1)
print(df_calculated)
slots slots slots
group
g1 1 1.0 1
g50 9 4.5 8
Issue 1: Naming the new columns appropriately
Can I add an argument to concat to name these columns "slots_sum", "slots_avg", and "slots_max"?
Issue 2: Add columns to df
I would prefer to add the new columns to the df just to the right of the "source" column ("slots" in this case). Desired output would look something like this:
group owner failed granted_pe slots slots_sum slots_avg slots_max
0 g1 u1 0 single 1 1 1.0 1
1 g50 u92 0 shared 8 9 4.5 8
2 g50 u92 0 shared 1
My actual df is 4.5 mil rows, 23 cols. I will want to do something similar for other columns.
Upvotes: 0
Views: 197
Reputation: 153510
Another way is to use keys
parameter in pd.concat then merge multiindex column headers
df = pd.DataFrame([['g1', 'u1', 0, 'single', 1],
['g50', 'u92', '0', 'shared', '8'],
['g50', 'u92', '0', 'shared', '1']],
columns=['group', 'owner', 'failed','granted_pe', 'slots'])
df = (df.astype(dtype={'group':'str', 'owner':'str','failed':'int', 'granted_pe':'str', 'slots':'int'}))
df_calculated = pd.concat([
df.loc[:,['group', 'slots']].groupby(['group']).sum(),
df.loc[:,['group', 'slots']].groupby(['group']).mean(),
df.loc[:,['group', 'slots']].groupby(['group']).max()
], axis=1, keys=['sum','mean','max'])
df_calculated.columns = [f'{j}_{i}' for i,j in df_calculated.columns]
print(df_calculated)
Output:
slots_sum slots_mean slots_max
group
g1 1 1.0 1
g50 9 4.5 8
Upvotes: 2
Reputation: 323356
Using agg
with add_prefix
then merge
it back
yourdf=df.merge(df.groupby('group')['slots'].agg(['sum','mean','max']).add_prefix('slots_').reset_index(),how='left')
Out[86]:
group owner failed ... slots_sum slots_mean slots_max
0 g1 u1 0 ... 1 1.0 1
1 g50 u92 0 ... 9 4.5 8
2 g50 u92 0 ... 9 4.5 8
Upvotes: 4