Leandro Baruch
Leandro Baruch

Reputation: 117

Dataframe.sample - Weights - How to use it?

I have this situation: A have a probability of 0.1348 calculated in a variable called treat_conv

Now, I am trying to create a dataframe from the original dataframe, using this probability to bring a especified column. Is that possible? I am trying to using weights but no success. Maybe am I using it wrong?

Follow my code:

weights = np.array(treat_conv) #creating a array with treat_conv new_page_converted = df2.sample(n = treat_group.shape[0], weights=df2.converted(weights)) #creating new dataframe with the number of rows of treat_group and the column converted must have a 0.13 of chance to bring value 1

So, the code works if I use the n alone. It creates a new dataframe with the correct ammount of rows. But I cant get the correct probabiliy to bring certain ammount of value 1 in converted column.

I hope my explanation is undestandable. Thank you!

Upvotes: 0

Views: 1322

Answers (1)

tk78
tk78

Reputation: 957

You could do something like this

import pandas as pd
import numpy as np


df = pd.DataFrame(data=np.arange(0, 100, 1), columns=["SomeValue"])
selected = pd.DataFrame(data=np.random.choice(df["SomeValue"], int(len(df["SomeValue"]) * 0.13), replace=False),
                        columns=["SomeValue"])
selected["Trigger"] = 1
df = df.merge(selected, how="left", on="SomeValue")
df["Trigger"].fillna(0, inplace=True)

"df" is your original DataFrame. Then select random 13% of the values and add a column indicating they've been selected. Finally, merge all back to your original Dataframe.

Upvotes: 1

Related Questions