Davide Barranca
Davide Barranca

Reputation: 359

Conditionally fill column with value from another DataFrame based on row match in Pandas

I find myself lost trying to solve this problem (automating tax paperwork). I have two dataframes: one with the quarterly historical records of EUR/USD exchange rates, and another with my own invoices, as an example:

import pandas as pd
import numpy as np

usdeur = [(pd.Timestamp('20170705'),1.1329),
          (pd.Timestamp('20170706'),1.1385),
          (pd.Timestamp('20170707'),1.1412),
          (pd.Timestamp('20170710'),1.1387),
          (pd.Timestamp('20170711'),1.1405),
          (pd.Timestamp('20170712'),1.1449)]
labels = ['Date', 'Rate']
rates = pd.DataFrame.from_records(usdeur, columns=labels)

transactions = [(pd.Timestamp('20170706'), 'PayPal',     'USD', 100, 1),
                (pd.Timestamp('20170706'), 'Fastspring', 'USD', 200, 1),
                (pd.Timestamp('20170709'), 'Fastspring', 'USD', 100, 1),
                (pd.Timestamp('20170710'), 'EU',         'EUR', 100, 1),
                (pd.Timestamp('20170710'), 'PayPal',     'USD', 200, 1)]
labels = ['Date', 'From', 'Currency', 'Amount', 'Rate']
sales =pd.DataFrame.from_records(transactions, columns=labels)

resulting in:

enter image description here

I would need to have the sales['Rate'] column filled with the proper exchange rates from the rates['Rate'], that is to say:

The full result should look like the following (note that row #2 has the rate from 2017-07-07):

Processed result

I've tried to follow several suggested solutions from other questions, but with no luck. Thank you very much in advance

Upvotes: 3

Views: 6985

Answers (3)

Vaishali
Vaishali

Reputation: 38415

Here is how I would do it without merge. 1. Fill rates with missing dates and ffill as with other answers but keep Date as index. 2. Map this dataframe to sales, use loc to not include rows with EUR

idx = pd.date_range(rates['Date'].min(), rates['Date'].max())
rates = rates.set_index('Date').reindex(idx).ffill()
sales.loc[sales['Currency'] != 'EUR','Rate'] = sales.loc[sales['Currency'] != 'EUR','Date'].map(rates['Rate'])

    Date        From        Currency    Amount  Rate
0   2017-07-06  PayPal      USD         100     1.1385
1   2017-07-06  Fastspring  USD         200     1.1385
2   2017-07-09  Fastspring  USD         100     1.1412
3   2017-07-10  EU          EUR         100     1.0000
4   2017-07-10  PayPal      USD         200     1.1387

Or you can even do it without changing the dataframe rates

mapper = rates.set_index('Date').reindex(sales['Date'].unique()).ffill()['Rate']

sales.loc[sales['Currency'] != 'EUR','Rate'] = sales.loc[sales['Currency'] != 'EUR','Date'].map(mapper)

Timetesting:

wen:       0.011892538983374834
gayatri:   0.13312408898491412
vaishali : 0.009498710976913571

Upvotes: 1

BENY
BENY

Reputation: 323226

I break down the steps , by using pd.merge_asof

sales=pd.merge_asof(sales,rates,on='Date',direction='backward',allow_exact_matches =True)
sales.loc[sales.From=='EU','Rate_y']=sales.Rate_x

sales
Out[748]: 
        Date        From Currency  Amount  Rate_x  Rate_y
0 2017-07-06      PayPal      USD     100       1  1.1385
1 2017-07-06  Fastspring      USD     200       1  1.1385
2 2017-07-09  Fastspring      USD     100       1  1.1412
3 2017-07-10          EU      EUR     100       1  1.0000
4 2017-07-10      PayPal      USD     200       1  1.1387

Then

sales.drop('Rate_x',1).rename(columns={'Rate_y':'Rate'})
Out[749]: 
        Date        From Currency  Amount    Rate
0 2017-07-06      PayPal      USD     100  1.1385
1 2017-07-06  Fastspring      USD     200  1.1385
2 2017-07-09  Fastspring      USD     100  1.1412
3 2017-07-10          EU      EUR     100  1.0000
4 2017-07-10      PayPal      USD     200  1.1387

Upvotes: 2

Gayatri
Gayatri

Reputation: 2253

You can change your rates dataframe to include all the dates and then forward fill,create a column called "Currency" in your Rates Dataframe and then join the two df's on both the date & currency columns.

idx = pd.DataFrame(pd.date_range('2017-07-05', '2017-07-12'),columns=['Date'])
rates = pd.merge(idx,rates,how="left",on="Date")
rates['Currency'] = 'USD'
rates['Rate'] = rates['Rate'].ffill()           

     Date   Rate    Currency
0   2017-07-05  1.1329  USD
1   2017-07-06  1.1385  USD
2   2017-07-07  1.1412  USD
3   2017-07-08  1.1412  USD
4   2017-07-09  1.1412  USD
5   2017-07-10  1.1387  USD
6   2017-07-11  1.1405  USD
7   2017-07-12  1.1449  USD

then doing a left join would give:

result = pd.merge(sales,rates,how="left",on=["Currency","Date"])
result['Rate'] = np.where(result['Currency'] == 'EUR', 1, result['Rate_y'])
result = result.drop(['Rate_x','Rate_y'],axis =1)

would give:

     Date         From      Currency    Amount  Rate
0   2017-07-06  PayPal           USD    100 1.1385
1   2017-07-06  Fastspring       USD    200 1.1385
2   2017-07-09  Fastspring       USD    100 1.1412
3   2017-07-10  EU               EUR    100 1.0000
4   2017-07-10  PayPal           USD    200 1.1387

Upvotes: 2

Related Questions