Reputation: 61
I have the following dataframe, cr_df, which shows the rate at which ID1 converts to ID2
ID1 ID2 Conversion Rate 0 1 A 0.046562 1 1 B 0.315975 2 1 C 0.577998 3 1 D 0.059465 4 2 A 0.6 5 2 B 0.4
Then I have another dataframe, raw_df, in the format of ID1 such as:
ID1 Value 0 1 100 1 2 200
My goal is to output a dataframe final_df, in the ID2 format that looks something like:
ID2 Value 0 C 100 1 A 200
Where the mapping from ID1 consists of selecting a random value between 0 and 1 and picking the ID2 based off the conversion rates.
How can I achieve this in pandas? (Do I need to use .apply?)
Upvotes: 1
Views: 1002
Reputation: 880777
Given this setup:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'ID1': [1]*4+[2]*2, 'ID2':list('ABCDAB'),
'Conversion Rate': [0.046562, 0.315975, 0.577998, 0.059465, 0.6, 0.4]})
raw_df = pd.DataFrame({'ID1': [1,2], 'Value':[100, 200]})
you could define a function random_id2
:
def random_id2(x):
return np.random.choice(x['ID2'], p=x['Conversion Rate'].values)
and use groupby/apply
:
id2 = df.groupby(['ID1']).apply(random_id2)
to obtain the Series
ID1
1 C
2 A
dtype: object
You could then build final_df
by mapping raw_df['ID1']
values to id2
values:
final_df = raw_df.copy()
final_df['ID1'] = final_df['ID1'].map(id2)
final_df = final_df.rename(columns={'ID1': 'ID2'})
import numpy as np
import pandas as pd
df = pd.DataFrame({
'ID1': [1]*4+[2]*2, 'ID2':list('ABCDAB'),
'Conversion Rate': [0.046562, 0.315975, 0.577998, 0.059465, 0.6, 0.4]})
raw_df = pd.DataFrame({'ID1': [1,2], 'Value':[100, 200]})
def random_id2(x):
return np.random.choice(x['ID2'], p=x['Conversion Rate'].values)
id2 = df.groupby(['ID1']).apply(random_id2)
final_df = raw_df.copy()
final_df['ID1'] = final_df['ID1'].map(id2)
final_df = final_df.rename(columns={'ID1': 'ID2'})
print(final_df)
yields
ID2 Value
0 C 100
1 A 200
Upvotes: 1
Reputation: 76406
You can do a combination of the following:
To make a weighted random choice of the rows, use the answer in this question; specifically, make a weighted selection of range(len(df))
with the weights given by df[Conversion Rate]
.
To select the rows with the given indices, see here.
To join the resulting dataframe with the second one, use merge
Upvotes: 1