Elias Urra
Elias Urra

Reputation: 93

Python Pandas Weighted dummy variables?

is there a way to obtain a weighted dummy variable using pandas? I have a two dataframes, one with the categorical values and another with a continuous variable...

df1 = pd.DataFrame(data=[[1., 3., 2.], [2., 1.], [0.], [0., 2., 2.], [0., 2.]])
df2 = pd.DataFrame(data=[['a', 'c', 'd'], ['a', 'b'], ['c'], ['b', 'c', 'd'], ['a', 'b']])

The idea is to obtain a dummy dataframe, but with weighted dummy variables... meaning: for row 0, 1.0 + 3.0 + 2.0 = 100%... the dummy variables should be, instead of 0 and 1:

a = 1.0/6.0
c = 3.0/6.0
d = 2.0/6.0

and each of this results, should be the dummy dataframe.

What I actually have is that it is 0 or 1, 0 if it is NaN and 1 if it exists...

dummies = pd.get_dummies(df2, columns=[0,1,2])

And this is my output

What I intend to do is to obtain the same matrix... but, instead of 1s and 0s obtain the weighted dummy variable... a, b and c have different importance on my model...

Upvotes: 0

Views: 196

Answers (0)

Related Questions