Reputation: 93
is there a way to obtain a weighted dummy variable using pandas? I have a two dataframes, one with the categorical values and another with a continuous variable...
df1 = pd.DataFrame(data=[[1., 3., 2.], [2., 1.], [0.], [0., 2., 2.], [0., 2.]])
df2 = pd.DataFrame(data=[['a', 'c', 'd'], ['a', 'b'], ['c'], ['b', 'c', 'd'], ['a', 'b']])
The idea is to obtain a dummy dataframe, but with weighted dummy variables... meaning: for row 0, 1.0 + 3.0 + 2.0 = 100%... the dummy variables should be, instead of 0 and 1:
a = 1.0/6.0
c = 3.0/6.0
d = 2.0/6.0
and each of this results, should be the dummy dataframe.
What I actually have is that it is 0 or 1, 0 if it is NaN and 1 if it exists...
dummies = pd.get_dummies(df2, columns=[0,1,2])
What I intend to do is to obtain the same matrix... but, instead of 1s and 0s obtain the weighted dummy variable... a, b and c have different importance on my model...
Upvotes: 0
Views: 196