Reputation: 117
I have this situation:
A have a probability of 0.1348 calculated in a variable called treat_conv
Now, I am trying to create a dataframe from the original dataframe, using this probability to bring a especified column. Is that possible? I am trying to using weights
but no success. Maybe am I using it wrong?
Follow my code:
weights = np.array(treat_conv) #creating a array with treat_conv
new_page_converted = df2.sample(n = treat_group.shape[0], weights=df2.converted(weights)) #creating new dataframe with the number of rows of treat_group and the column converted must have a 0.13 of chance to bring value 1
So, the code works if I use the n
alone. It creates a new dataframe with the correct ammount of rows. But I cant get the correct probabiliy to bring certain ammount of value 1 in converted
column.
I hope my explanation is undestandable. Thank you!
Upvotes: 0
Views: 1322
Reputation: 957
You could do something like this
import pandas as pd
import numpy as np
df = pd.DataFrame(data=np.arange(0, 100, 1), columns=["SomeValue"])
selected = pd.DataFrame(data=np.random.choice(df["SomeValue"], int(len(df["SomeValue"]) * 0.13), replace=False),
columns=["SomeValue"])
selected["Trigger"] = 1
df = df.merge(selected, how="left", on="SomeValue")
df["Trigger"].fillna(0, inplace=True)
"df" is your original DataFrame. Then select random 13% of the values and add a column indicating they've been selected. Finally, merge all back to your original Dataframe.
Upvotes: 1