Reputation: 2478
I have a dataframe and I want to create weighted columns based on the number of times values appear, per label in column label
.
df = pd.DataFrame([
['a',1,1,1],
['a',1,23,10],
['a',1,2,2],
['b',1,14,2],
['a',255,255,255]
]
,columns=['w','r','g','b']
)
In column label
there are 4 instances of a and one of b. For label a in column r
there are three 1's (75%) and one 255 (25%) and for letter b there is one 1 (100%). How can I multiply the original matrix so that each value is weighted by the number of times it appears in the column, grouped by column label
, using pandas?
As an example, column r
would become:
[a 1 * 0.75 \\percentages
a 1 * 0.75
a 1 * 0.75
b 1 * 1.0
a 255 * 0.25]
I want to do this for each of the columns r
, g
and b
.
I've managed to factor my numbers down based on the number of times they appear per label but I'm not sure how to apply this back to the original dataframe:
weighted = pd.DataFrame()
cols = [x for x in pw.columns if x != 'w']
for col in cols:
other_col = [x for x in cols if x != col and x!='w'][0]
g = df.groupby(by=['w', col]).agg('count')
w = pd.DataFrame(g.index.get_level_values(1) * (g[other_col] / g[other_col].groupby('w').sum()[g[other_col].index.get_level_values(0)].values))
weighted = pd.concat([weighted, w], axis=0)
weighted
Upvotes: 1
Views: 66
Reputation: 61910
You could use transform + value_counts:
percentage = df.groupby('w').transform(lambda x: x.map(x.value_counts(normalize=True)))
print(percentage)
Output
r g b
0 0.75 0.25 0.25
1 0.75 0.25 0.25
2 0.75 0.25 0.25
3 1.00 1.00 1.00
4 0.25 0.25 0.25
If you want to overwrite the original DataFrame, do:
df.iloc[:, 1:] = df.groupby('w').transform(lambda x: x.map(x.value_counts(normalize=True)))
print(df)
Output
w r g b
0 a 0.75 0.25 0.25
1 a 0.75 0.25 0.25
2 a 0.75 0.25 0.25
3 b 1.00 1.00 1.00
4 a 0.25 0.25 0.25
Upvotes: 1