Reputation: 359
I find myself lost trying to solve this problem (automating tax paperwork). I have two dataframes: one with the quarterly historical records of EUR/USD exchange rates, and another with my own invoices, as an example:
import pandas as pd
import numpy as np
usdeur = [(pd.Timestamp('20170705'),1.1329),
(pd.Timestamp('20170706'),1.1385),
(pd.Timestamp('20170707'),1.1412),
(pd.Timestamp('20170710'),1.1387),
(pd.Timestamp('20170711'),1.1405),
(pd.Timestamp('20170712'),1.1449)]
labels = ['Date', 'Rate']
rates = pd.DataFrame.from_records(usdeur, columns=labels)
transactions = [(pd.Timestamp('20170706'), 'PayPal', 'USD', 100, 1),
(pd.Timestamp('20170706'), 'Fastspring', 'USD', 200, 1),
(pd.Timestamp('20170709'), 'Fastspring', 'USD', 100, 1),
(pd.Timestamp('20170710'), 'EU', 'EUR', 100, 1),
(pd.Timestamp('20170710'), 'PayPal', 'USD', 200, 1)]
labels = ['Date', 'From', 'Currency', 'Amount', 'Rate']
sales =pd.DataFrame.from_records(transactions, columns=labels)
resulting in:
I would need to have the sales['Rate']
column filled with the proper exchange rates from the rates['Rate']
, that is to say:
sales['Currency']
is 'EUR'
, leave it alone.sales
, find the row in rates
with matching 'Date'
; grab that very rates['Rate']
value and put it in sales['Rate']
'Date'
(e.g. during holidays, the exchange market is closed), check the previous row until a suitable value is found.The full result should look like the following (note that row #2 has the rate from 2017-07-07):
I've tried to follow several suggested solutions from other questions, but with no luck. Thank you very much in advance
Upvotes: 3
Views: 6985
Reputation: 38415
Here is how I would do it without merge. 1. Fill rates with missing dates and ffill as with other answers but keep Date as index. 2. Map this dataframe to sales, use loc to not include rows with EUR
idx = pd.date_range(rates['Date'].min(), rates['Date'].max())
rates = rates.set_index('Date').reindex(idx).ffill()
sales.loc[sales['Currency'] != 'EUR','Rate'] = sales.loc[sales['Currency'] != 'EUR','Date'].map(rates['Rate'])
Date From Currency Amount Rate
0 2017-07-06 PayPal USD 100 1.1385
1 2017-07-06 Fastspring USD 200 1.1385
2 2017-07-09 Fastspring USD 100 1.1412
3 2017-07-10 EU EUR 100 1.0000
4 2017-07-10 PayPal USD 200 1.1387
Or you can even do it without changing the dataframe rates
mapper = rates.set_index('Date').reindex(sales['Date'].unique()).ffill()['Rate']
sales.loc[sales['Currency'] != 'EUR','Rate'] = sales.loc[sales['Currency'] != 'EUR','Date'].map(mapper)
Timetesting:
wen: 0.011892538983374834
gayatri: 0.13312408898491412
vaishali : 0.009498710976913571
Upvotes: 1
Reputation: 323226
I break down the steps , by using pd.merge_asof
sales=pd.merge_asof(sales,rates,on='Date',direction='backward',allow_exact_matches =True)
sales.loc[sales.From=='EU','Rate_y']=sales.Rate_x
sales
Out[748]:
Date From Currency Amount Rate_x Rate_y
0 2017-07-06 PayPal USD 100 1 1.1385
1 2017-07-06 Fastspring USD 200 1 1.1385
2 2017-07-09 Fastspring USD 100 1 1.1412
3 2017-07-10 EU EUR 100 1 1.0000
4 2017-07-10 PayPal USD 200 1 1.1387
Then
sales.drop('Rate_x',1).rename(columns={'Rate_y':'Rate'})
Out[749]:
Date From Currency Amount Rate
0 2017-07-06 PayPal USD 100 1.1385
1 2017-07-06 Fastspring USD 200 1.1385
2 2017-07-09 Fastspring USD 100 1.1412
3 2017-07-10 EU EUR 100 1.0000
4 2017-07-10 PayPal USD 200 1.1387
Upvotes: 2
Reputation: 2253
You can change your rates dataframe to include all the dates and then forward fill,create a column called "Currency" in your Rates Dataframe and then join the two df's on both the date & currency columns.
idx = pd.DataFrame(pd.date_range('2017-07-05', '2017-07-12'),columns=['Date'])
rates = pd.merge(idx,rates,how="left",on="Date")
rates['Currency'] = 'USD'
rates['Rate'] = rates['Rate'].ffill()
Date Rate Currency
0 2017-07-05 1.1329 USD
1 2017-07-06 1.1385 USD
2 2017-07-07 1.1412 USD
3 2017-07-08 1.1412 USD
4 2017-07-09 1.1412 USD
5 2017-07-10 1.1387 USD
6 2017-07-11 1.1405 USD
7 2017-07-12 1.1449 USD
then doing a left join would give:
result = pd.merge(sales,rates,how="left",on=["Currency","Date"])
result['Rate'] = np.where(result['Currency'] == 'EUR', 1, result['Rate_y'])
result = result.drop(['Rate_x','Rate_y'],axis =1)
would give:
Date From Currency Amount Rate
0 2017-07-06 PayPal USD 100 1.1385
1 2017-07-06 Fastspring USD 200 1.1385
2 2017-07-09 Fastspring USD 100 1.1412
3 2017-07-10 EU EUR 100 1.0000
4 2017-07-10 PayPal USD 200 1.1387
Upvotes: 2