Reputation: 47
I am developing an analysis of a very extensive dataset, the dataset has the attributes (g, month, p), which are organized by group using groupby of pandas.
G month p
G1 1 0.040698496
G1 2 0.225640771
G1 3 0.236948047
G1 4 0.119339576
G1 5 0.779272432
G2 1 0.892168636
G2 2 0.062467967
G2 3 0.936044226
G3 1 0.509212613
G3 2 0.476718744
G3 3 0.407299543
G3 4 0.843260893
G4 1 0.882554249
I then extracted the statistics by group G from 1 to n as shown below
g1 g2 g3 gn
mean 0.280379864 0.630226943 0.559122948 …
std 0.290326376 0.49218285 0.194135874 …
count 5 3 4 …
it is required to create a new field that is the product of the group average by the variable p, there is some way to make it automatic ..., due to the extension (more than 200 groups), do it individually taking a lot of time. the expected output is
G month p STD*p
G1 1 0.040698496 0.011815847
G1 2 0.225640771 0.065509467
G1 3 0.236948047 0.068792268
G1 4 0.119339576 0.034647427
G1 5 0.779272432 0.226243341
G2 1 0.892168636 0.439110102
G2 2 0.062467967 0.030745662
G2 3 0.936044226 0.460704915
G3 1 0.509212613 0.098856436
G3 2 0.476718744 0.09254821
G3 3 0.407299543 0.079071453
G3 4 0.843260893 0.16370719
Upvotes: 1
Views: 52
Reputation: 862681
Use GroupBy.transform
with std
for repeating aggregate values, so is possible multiple by p
column:
df['STD*p'] = df.groupby('G')['p'].transform('std').mul(df['p'])
print (df)
G month p STD*p
0 G1 1 0.040698 0.011816
1 G1 2 0.225641 0.065509
2 G1 3 0.236948 0.068792
3 G1 4 0.119340 0.034647
4 G1 5 0.779272 0.226243
5 G2 1 0.892169 0.439110
6 G2 2 0.062468 0.030746
7 G2 3 0.936044 0.460705
8 G3 1 0.509213 0.098856
9 G3 2 0.476719 0.092548
10 G3 3 0.407300 0.079071
11 G3 4 0.843261 0.163707
12 G4 1 0.882554 NaN
Upvotes: 0