Create weighted mean per column in pandas

Question

I have a dataframe and I want to create weighted columns based on the number of times values appear, per label in column label.

df = pd.DataFrame([
    ['a',1,1,1],
    ['a',1,23,10],
    ['a',1,2,2], 
    ['b',1,14,2], 
    ['a',255,255,255]
]
,columns=['w','r','g','b']
)

In column label there are 4 instances of a and one of b. For label a in column r there are three 1's (75%) and one 255 (25%) and for letter b there is one 1 (100%). How can I multiply the original matrix so that each value is weighted by the number of times it appears in the column, grouped by column label, using pandas?

As an example, column r would become:

[a    1   * 0.75 \percentages
 a    1   * 0.75
 a    1   * 0.75
 b    1   * 1.0
 a    255 * 0.25]

I want to do this for each of the columns r, g and b.

I've managed to factor my numbers down based on the number of times they appear per label but I'm not sure how to apply this back to the original dataframe:

weighted = pd.DataFrame()
cols = [x for x in pw.columns if x != 'w']
for col in cols:
    other_col = [x for x in cols if x != col and x!='w'][0]
    
    g = df.groupby(by=['w', col]).agg('count')
    w = pd.DataFrame(g.index.get_level_values(1) * (g[other_col] / g[other_col].groupby('w').sum()[g[other_col].index.get_level_values(0)].values))
    weighted = pd.concat([weighted, w], axis=0)
weighted

Dani Mesejo · Accepted Answer

You could use transform + value_counts:

percentage = df.groupby('w').transform(lambda x: x.map(x.value_counts(normalize=True)))
print(percentage)

Output

      r     g     b
0  0.75  0.25  0.25
1  0.75  0.25  0.25
2  0.75  0.25  0.25
3  1.00  1.00  1.00
4  0.25  0.25  0.25

If you want to overwrite the original DataFrame, do:

df.iloc[:, 1:] = df.groupby('w').transform(lambda x: x.map(x.value_counts(normalize=True)))
print(df)

Output

   w     r     g     b
0  a  0.75  0.25  0.25
1  a  0.75  0.25  0.25
2  a  0.75  0.25  0.25
3  b  1.00  1.00  1.00
4  a  0.25  0.25  0.25

Create weighted mean per column in pandas

Answers (1)

Related Questions