Reputation: 468
I am trying to merge two dataframe using a FK and between two dates then save the output in a new dataframe.
Consider the below example:
# first_df
FK date value1 value2 ... (more columns)
1 2019-01-01 50 50
1 2019-01-02 40 80
1 2019-01-03 80 20
1 2019-01-04 18 44
1 2019-01-05 120 50
1 2019-01-06 80 0
1 2019-01-10 60 65
1 2019-01-15 25 44
1 2019-01-25 20 20
2 2019-01-01 50 40
2 2019-01-02 80 45
...............................
# second_df
FK date percentage
1 2019-01-01 50
1 2019-01-05 80
1 2019-01-10 40
1 2019-01-15 60
1 2019-01-25 90
2 2019-01-01 48
2 2019-01-08 40
2 2019-01-20 48
......................
# output_df
FK date value1 value2 ... (more columns)
1 2019-01-01 50% of 50 = 25 50% of 50 = 25
1 2019-01-02 50% of 40 = 20 50% of 80 = 40
1 2019-01-03 50% of 80 = 40 50% of 20 = 10
1 2019-01-04 50% of 18 = 9 50% of 44 = 22
1 2019-01-05 80% of 120 = 96 80% of 50 = 40
1 2019-01-06 80% of 80 = 64 80% of 0 = 0
1 2019-01-10 40% of 60 = 24 40% of 65 = 26
1 2019-01-15 60% of 25 = 15 60% of 44 = 26.4
1 2019-01-25 90% of 20 = 18 90% of 20 = 18
2 2019-01-01 48% of 50 = 24 48% of 40 = 19.2
2 2019-01-02 48% of 80 = 38.4 48% of 45 = 21.6
Pay attention to the 1st record with FK 2 as you can see, the index is my FK.
The percentage is applied to all records with the same FK where my date is: second_df.date <= first_df.date < and second_df.date_NEXT
For instance, between 2019-01-01 and 2019-01-04 I apply the percentage 50 (coming from second_df)
I have been searching for a moment now for a clean and readable implementation... I know I can set the index on my fk and use apply on my df by specifying the "value1" column. But how would you handle this if there is something like more than 5 columns ?
Hope you will understand that I have little experience with pandas
EDIT1
data1 = {'FK':[1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2],
'date':['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04', '2019-01-05', '2019-01-06', '2019-01-10', '2019-01-15', '2019-01-25', '2019-01-01', '2019-01-02'],
'value1':[50, 40, 80, 18, 120, 80, 60, 25, 20, 50, 80]}
data2 = {'FK': [1, 1, 1, 1, 1, 2, 2],
'date': ['2019-01-01', '2019-01-05', '2019-01-10', '2019-01-15', '2019-01-25', '2019-01-01',
'2019-01-08'],
'percentage': [50, 80, 40, 60, 90, 48, 40]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
merged_df = pd.merge_asof(df1.sort_values('date'), df2.sort_values('date'), by='FK', on='date').sort_values('FK')
EXCEPTION happening using the above dataset: Function call with ambiguous argument types
In case you have this exception, this is because your column "on" (in my case FK) is not considered as a date, but as a string.
According to panda doc: [...]Furthermore this must be a numeric column, such as datetimelike, integer, or float.
Upvotes: 3
Views: 423
Reputation: 323226
In your case we use merge_asof
df=pd.merge_asof(df1.sort_values('date'),df2.sort_values('date'),by='FK',on='date').sort_values('FK')
Then we have percentage
and values in same df, we can do multiple
df[['value1','value2']]=df[['value1','value2']].mul(df.percentage/100,0)
Upvotes: 3