jovicbg
jovicbg

Reputation: 1553

Create new columns by grouping and aggregating multicolumns in pandas

I have a dataframe with about 50 columns, some of them are period_start_time, id, speed_throughput, etc. dataframe sample:

    id     period_start_time         speed_througput    ...
0    1     2017-06-14 20:00:00              6
1    1     2017-06-14 20:00:00              10
2    1     2017-06-14 21:00:00              2
3    1     2017-06-14 21:00:00              5
4    2     2017-06-14 20:00:00              8
5    2     2017-06-14 20:00:00              12
...

I have tried to go create two new columns by grouping two columns(id and period_start_time) and find avg and min of speed_trhoughput. The code I've tried:

df['Throughput_avg']=df.sort_values(['period_start_time'],ascending=False).groupby(['period_start_time','id'])[['speed_trhoughput']].max()
df['Throughput_min'] = df.groupby(['period_start_time', 'id'])[['speed_trhoughput']].min()

As you can see, there are two ways I've tried, but nothing works. The error message I received for both attempts:

 TypeError:incompatible index of inserted column with frame index

I suppose you know what my output needs to be, so there is no need to post it.

Upvotes: 1

Views: 1088

Answers (1)

piRSquared
piRSquared

Reputation: 294488

Option 1
Use agg in a groupby and join to attach to main dataframe

df.join(
    df.groupby(['id', 'period_start_time']).speed_througput.agg(
        ['mean', 'min']
    ).rename(columns={'mean': 'avg'}).add_prefix('Throughput_'),
    on=['id', 'period_start_time']
)

   id    period_start_time  speed_througput  Throughput_avg  Throughput_min
0   1  2017-06-14 20:00:00                6             8.0               6
1   1  2017-06-14 20:00:00               10             8.0               6
2   1  2017-06-14 21:00:00                2             3.5               2
3   1  2017-06-14 21:00:00                5             3.5               2
4   2  2017-06-14 20:00:00                8            10.0               8
5   2  2017-06-14 20:00:00               12            10.0               8

Option 2
Use transform in a groupby context and use assign to add the new columns

g = df.groupby(['id', 'period_start_time']).speed_througput.transform
df.assign(Throughput_avg=g('mean'), Throughput_min=g('min'))

   id    period_start_time  speed_througput  Throughput_avg  Throughput_min
0   1  2017-06-14 20:00:00                6             8.0               6
1   1  2017-06-14 20:00:00               10             8.0               6
2   1  2017-06-14 21:00:00                2             3.5               2
3   1  2017-06-14 21:00:00                5             3.5               2
4   2  2017-06-14 20:00:00                8            10.0               8
5   2  2017-06-14 20:00:00               12            10.0               8

Upvotes: 1

Related Questions