WilliamW
WilliamW

Reputation: 468

Merging two dataframe using index as key and date

I am trying to merge two dataframe using a FK and between two dates then save the output in a new dataframe.

Consider the below example:

# first_df 
FK    date          value1   value2 ... (more columns)
1     2019-01-01    50       50
1     2019-01-02    40       80
1     2019-01-03    80       20
1     2019-01-04    18       44
1     2019-01-05    120      50
1     2019-01-06    80       0
1     2019-01-10    60       65
1     2019-01-15    25       44
1     2019-01-25    20       20
2     2019-01-01    50       40
2     2019-01-02    80       45
...............................


# second_df
FK    date          percentage
1     2019-01-01    50
1     2019-01-05    80
1     2019-01-10    40
1     2019-01-15    60
1     2019-01-25    90
2     2019-01-01    48
2     2019-01-08    40
2     2019-01-20    48
......................


# output_df
FK    date          value1            value2 ... (more columns)
1     2019-01-01    50% of 50 = 25    50% of 50 = 25
1     2019-01-02    50% of 40 = 20    50% of 80 = 40
1     2019-01-03    50% of 80 = 40    50% of 20 = 10
1     2019-01-04    50% of 18 = 9     50% of 44 = 22
1     2019-01-05    80% of 120 = 96   80% of 50 = 40
1     2019-01-06    80% of 80 = 64    80% of 0 = 0
1     2019-01-10    40% of 60 = 24    40% of 65 = 26
1     2019-01-15    60% of 25 = 15    60% of 44 = 26.4
1     2019-01-25    90% of 20 = 18    90% of 20 = 18
2     2019-01-01    48% of 50 = 24    48% of 40 = 19.2
2     2019-01-02    48% of 80 = 38.4  48% of 45 = 21.6

Pay attention to the 1st record with FK 2 as you can see, the index is my FK.

The percentage is applied to all records with the same FK where my date is: second_df.date <= first_df.date < and second_df.date_NEXT

For instance, between 2019-01-01 and 2019-01-04 I apply the percentage 50 (coming from second_df)

I have been searching for a moment now for a clean and readable implementation... I know I can set the index on my fk and use apply on my df by specifying the "value1" column. But how would you handle this if there is something like more than 5 columns ?

Hope you will understand that I have little experience with pandas


EDIT1

data1 = {'FK':[1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2],
             'date':['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04', '2019-01-05', '2019-01-06', '2019-01-10', '2019-01-15', '2019-01-25', '2019-01-01', '2019-01-02'],
             'value1':[50, 40, 80, 18, 120, 80, 60, 25, 20, 50, 80]}
data2 = {'FK': [1, 1, 1, 1, 1, 2, 2],
             'date': ['2019-01-01', '2019-01-05', '2019-01-10', '2019-01-15', '2019-01-25', '2019-01-01',
                      '2019-01-08'],
             'percentage': [50, 80, 40, 60, 90, 48, 40]}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
merged_df = pd.merge_asof(df1.sort_values('date'), df2.sort_values('date'), by='FK', on='date').sort_values('FK')

EXCEPTION happening using the above dataset: Function call with ambiguous argument types

In case you have this exception, this is because your column "on" (in my case FK) is not considered as a date, but as a string.

According to panda doc: [...]Furthermore this must be a numeric column, such as datetimelike, integer, or float.

Upvotes: 3

Views: 423

Answers (1)

BENY
BENY

Reputation: 323226

In your case we use merge_asof

df=pd.merge_asof(df1.sort_values('date'),df2.sort_values('date'),by='FK',on='date').sort_values('FK')

Then we have percentage and values in same df, we can do multiple

df[['value1','value2']]=df[['value1','value2']].mul(df.percentage/100,0)

Upvotes: 3

Related Questions