arnold
arnold

Reputation: 618

Python Pandas Merge Two DataFrames With Conditional Logic

Supposed I have two pandas DataFrames as follows,

df_actual

ID  NAME    PRODUCT CHANNEL ACTUAL
1   A       W       G       233
1   A       W       M       4
1   A       X       G       188
1   A       X       M       4
1   A       Y       G       37
1   A       Y       M       4
1   A       Z       G       272
1   A       Z       M       4

and df_target

ID  NAME    PRODUCT TARGET
1   A       W       289
1   A       X       253
1   A       Y       36
1   A       Z       334

I am expecting to get the result as follows,

ID  NAME    PRODUCT CHANNEL ACTUAL  TARGET  
1   A       W       G       233     284     <<< 289 * (233/(233+4))
1   A       W       G       4       5       <<< 289 * (4/(233+4))
1   A       X       G       188     248     <<< 253 * (188/(248+5))
1   A       X       G       4       5       <<< 253 * (4/(248+5))
1   A       Y       M       37      32      …
1   A       Y       M       4       4       …
1   A       Z       M       272     329     …
1   A       Z       M       4       5       …

So basically what I want to do, adding a new column to the df_actual called TARGET. In order to compute the values within the column TARGET,

  1. I have to get the ACTUAL values in df_actual for certain ID, NAME, PRODUCT, and CHANNEL, and then get the TARGET value in df_target for the same ID, NAME, and PRODUCT as before.
  2. Then compute the new TARGET values for each CHANNEL using the contribution of each CHANNEL to the total CHANNEL for certain ID, NAME, and PRODUCT.

For example, to get the new TARGET values for ID 1, NAME A, PRODUCT W, and CHANNEL G, I need to use this formula 289 * (233 / (233 + 4)).

My real data is quite huge, more than 1 Mio rows. So if you can suggest the vectorize solution, I will be very thankful. Nevertheless, any suggestion will be much appreciated. Thanks.

Upvotes: 1

Views: 79

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210932

Consider the following vectorized approach:

In [39]: m = df_actual.merge(df_target)

In [40]: m
Out[40]:
   ID NAME PRODUCT CHANNEL  ACTUAL  TARGET
0   1    A       W       G     233     289
1   1    A       W       M       4     289
2   1    A       X       G     188     253
3   1    A       X       M       4     253
4   1    A       Y       G      37      36
5   1    A       Y       M       4      36
6   1    A       Z       G     272     334
7   1    A       Z       M       4     334

In [41]: m['TARGET'] = (m.TARGET * m.ACTUAL / m.groupby(['NAME','PRODUCT'])['ACTUAL'].transform('sum')).round()

In [42]: m
Out[42]:
   ID NAME PRODUCT CHANNEL  ACTUAL  TARGET
0   1    A       W       G     233   284.0
1   1    A       W       M       4     5.0
2   1    A       X       G     188   248.0
3   1    A       X       M       4     5.0
4   1    A       Y       G      37    32.0
5   1    A       Y       M       4     4.0
6   1    A       Z       G     272   329.0
7   1    A       Z       M       4     5.0

Upvotes: 3

Related Questions