Reputation: 618
Supposed I have two pandas DataFrame
s as follows,
df_actual
ID NAME PRODUCT CHANNEL ACTUAL
1 A W G 233
1 A W M 4
1 A X G 188
1 A X M 4
1 A Y G 37
1 A Y M 4
1 A Z G 272
1 A Z M 4
and df_target
ID NAME PRODUCT TARGET
1 A W 289
1 A X 253
1 A Y 36
1 A Z 334
I am expecting to get the result as follows,
ID NAME PRODUCT CHANNEL ACTUAL TARGET
1 A W G 233 284 <<< 289 * (233/(233+4))
1 A W G 4 5 <<< 289 * (4/(233+4))
1 A X G 188 248 <<< 253 * (188/(248+5))
1 A X G 4 5 <<< 253 * (4/(248+5))
1 A Y M 37 32 …
1 A Y M 4 4 …
1 A Z M 272 329 …
1 A Z M 4 5 …
So basically what I want to do, adding a new column to the df_actual
called TARGET
. In order to compute the values within the column TARGET
,
ACTUAL
values in df_actual
for certain ID
, NAME
, PRODUCT
, and CHANNEL
, and then get the TARGET
value in df_target
for the same ID
, NAME
, and PRODUCT
as before.TARGET
values for each CHANNEL
using the contribution of each CHANNEL
to the total CHANNEL
for certain ID
, NAME
, and PRODUCT
.For example, to get the new TARGET
values for ID
1, NAME
A, PRODUCT
W, and CHANNEL
G, I need to use this formula 289 * (233 / (233 + 4)).
TARGET
value in df_target
.ACTUAL
value in df_actual
for CHANNEL
GACTUAL
value in df_actual
for CHANNEL
MMy real data is quite huge, more than 1 Mio rows. So if you can suggest the vectorize solution, I will be very thankful. Nevertheless, any suggestion will be much appreciated. Thanks.
Upvotes: 1
Views: 79
Reputation: 210932
Consider the following vectorized approach:
In [39]: m = df_actual.merge(df_target)
In [40]: m
Out[40]:
ID NAME PRODUCT CHANNEL ACTUAL TARGET
0 1 A W G 233 289
1 1 A W M 4 289
2 1 A X G 188 253
3 1 A X M 4 253
4 1 A Y G 37 36
5 1 A Y M 4 36
6 1 A Z G 272 334
7 1 A Z M 4 334
In [41]: m['TARGET'] = (m.TARGET * m.ACTUAL / m.groupby(['NAME','PRODUCT'])['ACTUAL'].transform('sum')).round()
In [42]: m
Out[42]:
ID NAME PRODUCT CHANNEL ACTUAL TARGET
0 1 A W G 233 284.0
1 1 A W M 4 5.0
2 1 A X G 188 248.0
3 1 A X M 4 5.0
4 1 A Y G 37 32.0
5 1 A Y M 4 4.0
6 1 A Z G 272 329.0
7 1 A Z M 4 5.0
Upvotes: 3