Daniel Arges
Daniel Arges

Reputation: 365

how to apply a multiplier on a dataframe based on conditions?

Given this first dataframe df_1:

df_1 = pd.DataFrame({'id':[1,2,1,3,1],
                     'symbol':['A','B','C','A','A'],
                     'date':['2021-02-12','2021-02-09','2021-02-14','2021-02-02','2021-02-05'],
                     'value':[1,1,1,1,1]})
   id symbol        date  value
0   1      A  2021-02-12      1
1   2      B  2021-02-09      1
2   1      C  2021-02-14      1
3   3      A  2021-02-02      1
4   1      A  2021-02-05      1

And given this second df_2:

df_2 = pd.DataFrame({'id_symbol':['1_A', '1_A'],
                     'init_date':['2021-02-01','2021-02-01'],
                     'end_date':['2021-02-05', '2021-02-12'],
                     'multiplier':[5,2]})

I need to apply this df_2.multiplier on df_1 for rows on df_1.value where the concat of id and _ and symbol is equals the df_2.id_symbol, and if the df_1.date is within the df_2.init_date and df_2.end_date.

My result should be like this, after the code:

   id symbol        date   value
0   1      A  2021-02-12       2
1   2      B  2021-02-09       1
2   1      C  2021-02-14       1
3   3      A  2021-02-02       1
4   1      A  2021-02-05      10

5 = 1 * 5 // 10 = 1 * 5 * 2

My both dataframes are quite bigger than this.

Upvotes: 2

Views: 118

Answers (2)

ALollz
ALollz

Reputation: 59549

Since there's the possibility of multiplying the value more than once we will do a merge, find all possible matches and sum the multipliers to figure out how much we need to multiply the original row by.

First split the 'id_symbol' column and use datetime64 dtypes so we can do merges and comparisons:

import pandas as pd
import numpy as np

df_1['date'] = pd.to_datetime(df_1['date'])
df_2['init_date'] = pd.to_datetime(df_2['init_date'])
df_2['end_date'] = pd.to_datetime(df_2['end_date'])

df_2[['id', 'symbol']] = df_2['id_symbol'].str.split('_', expand=True)
df_2['id'] = df_2['id'].astype('int64')

Now merge on all possible multipliers, but set it to 1 if it's outside of the date range. Determine the total multiplier for that row (i.e. product of all multipliers that satisfy the date condition) and use that to multiply the with the original DataFrame (aligns on df_1's Index).

mult = df_1.reset_index().merge(df_2, on=['id', 'symbol'], how='left')
mult['multiplier'] = np.where(mult.date.between(mult.init_date, mult.end_date), 
                              mult.multiplier, 1)
mult = mult.groupby('index')['multiplier'].prod()
#index
#0     2.0
#1     1.0
#2     1.0
#3     1.0
#4    10.0
#Name: multiplier, dtype: float64

df_1['value'] = df_1['value']*mult

print(df1)
   id symbol       date  value
0   1      A 2021-02-12    2.0
1   2      B 2021-02-09    1.0
2   1      C 2021-02-14    1.0
3   3      A 2021-02-02    1.0
4   1      A 2021-02-05   10.0

Upvotes: 2

sophocles
sophocles

Reputation: 13821

I was able to get to your desired outcome with the following process. I will comment on the code to explain each step, but it is fairly simple:

# Our imports
import pandas as pd
import numpy as np

# Create the id_symbol column in df_1 and perform a left merge
df_1['id_symbol']=df_1['id'].astype(str)+"_"+df_1['symbol'].astype(str)
df_1 = pd.merge(df_1,df_2,how='left',on='id_symbol')

# Make sure you convert to datetime, and handle the NaT succesfully
for i in ['init_date','end_date']:
    df_1[i] = pd.to_datetime(df_1[i],errors='coerce')

# Use np.where to check whether the conditions you want are satisfied
df_1['value'] = np.where((df_1['init_date'] <= df_1['date']) & (df_1['date']  <= df_1['end_date']),
                         df_1['value']*df_1['multiplier'],1) 

# Drop the unwanted columns
df_1.drop(['id_symbol','init_date','end_date','multiplier'],axis=1,inplace=True)

Will get back your desired outcome:

   id symbol        date  value
0   1      A  2021-02-12    1.0
1   2      B  2021-02-09    1.0
2   1      C  2021-02-14    1.0
3   3      A  2021-02-02    1.0
4   1      A  2021-02-05    5.0

A bit of an ad-hoc, but seems to be working.

Upvotes: 2

Related Questions