Reputation: 680
I intend to combine columns of a Pandas DataFrame after groupby
. I looked for options that I can use but none of them does what I'm looking for. The closest option was .agg()
which performs on values of a column, however, I want to calculate a statistic of all features
for every given groupbyed row.
I am looking for something like this:
dataset.groupby(['company', 'team']).combine(new_cols=['features_mean'], to_combine=['feature 1':'feature 2'], funcs=[np.mean], axis=1)
Upvotes: 1
Views: 138
Reputation: 680
I realized that I don't even need to use groupby
. I can simply use apply
:
dataset['new measure'] = dataset.apply(lambda r: r['Feature 1':'Feature 12'].mean(), axis=1)
However, it runs slow due to using of for
loop in the implementation.
Upvotes: 0
Reputation: 863741
Use loc
with mean
:
dataset['new measure'] = dataset.loc[:, 'Feature 1':'Feature 12'].mean(axis=1)
Sample:
dataset = pd.DataFrame({'A':list('abcdef'),
'Feature 1':[4,5,4,5,5,4],
'Feature 2':[7,8,9,4,2,3],
'Feature 3':[1,3,5,7,1,0],
'Feature 4':[5,3,6,9,2,4],
'F':list('aaabbb')})
#print (dataset)
dataset['new measure'] = dataset.loc[:, 'Feature 1':'Feature 4'].mean(axis=1)
print (dataset)
A F Feature 1 Feature 2 Feature 3 Feature 4 new measure
0 a a 4 7 1 5 4.25
1 b a 5 8 3 3 4.75
2 c a 4 9 5 6 6.00
3 d b 5 4 7 9 6.25
4 e b 5 2 1 2 2.50
5 f b 4 3 0 4 2.75
Upvotes: 1