Reputation: 351
Trying to work with an exchange rate table and a transactions table, and how the transactions are joined to the exchange table depends on when the most recent exchange rate for that currency was relative to the transaction. The table contains many duplicates and many types of currencies and other fields not relevant to this issue.
I plan on joining the transactions by using BETWEEN for the start date and end date for each currency rate and the date of the transaction, however I don't have an end date field. The exchange rate for a currency ends when the next one starts.
For example:
Currency | Start Date | Rate |
---|---|---|
EUR | 01/12/2021 | 1.25 |
US | 01/12/2021 | 0.75 |
EUR | 25/12/2021 | 1.11 |
US | 10/12/2021 | 0.8 |
EUR | 01/12/2021 | 1.25 |
US | 25/12/2021 | 1.11 |
Should become:
Currency | Start Date | Rate | End Date |
---|---|---|---|
EUR | 01/12/2021 | 1.25 | 24/12/2021 |
US | 01/12/2021 | 0.75 | 09/12/2021 |
EUR | 25/12/2021 | 1.11 | today |
US | 10/12/2021 | 0.8 | today |
I was thinking of:
with ordered_currency as (
select distinct Currency, Start_date
from exchange_rate_table
order by currency, start_date asc
)
This would produce a table of all currencies in order of date, removing any duplicates.
The next step would be to check if the next line is the same currency, if it is then take it's start date, as the end date of the current line. If it's not the same currency then simply put current_date() as the end_date, and could then join this End_date field back onto the original table.
However, I'm not sure of the syntax which would be used to evaluate whether the line below has the same field (in this case currency)
Any help would be appreciated!
Upvotes: 1
Views: 57
Reputation: 25903
So another method, that doesn't use JOINS is:
with a CTE for a input data:
WITH fake_data(currency, start_date, rate) AS (
SELECT column1, to_date(column2, 'dd/mm/yyyy'), column3
FROM VALUES
('EUR','01/12/2021', 1.25),
('US','01/12/2021', 0.75),
('EUR','25/12/2021', 1.11),
('US','10/12/2021', 0.8),
('EUR','01/12/2021', 1.25),
('US','25/12/2021', 1.11)
)
then to get the end_time for each row (same as Caius's answer):
SELECT currency
,start_date
,rate
,lead(start_date,1,current_date) over (order by start_date) as end_date
FROM fake_data
ORDER BY 2;
gives:
CURRENCY | START_DATE | RATE | END_DATE |
---|---|---|---|
EUR | 2021-12-01 | 1.25 | 2021-12-01 |
US | 2021-12-01 | 0.75 | 2021-12-01 |
EUR | 2021-12-01 | 1.25 | 2021-12-10 |
US | 2021-12-10 | 0.8 | 2021-12-25 |
EUR | 2021-12-25 | 1.11 | 2021-12-25 |
US | 2021-12-25 | 1.11 | 2022-02-02 |
But to now chain the results together we want to filer out streaks of same, which we can do with a LAG and QUALIFY
SELECT currency
,start_date
,rate
,lag(currency) over (order by start_date) as lag_currency
FROM fake_data
QUALIFY lag_currency IS NULL OR lag_currency != currency
ORDER BY 2;
and given the prior "end on the row is not useful in that context, we can drop it, but now we want the next entry time to be the end_time of the streak
SELECT currency
,start_date
,rate
,lead(start_date,1,current_date) over (order by start_date) as end_date
FROM (
SELECT currency
,start_date
,rate
,lag(currency) over (order by start_date) as lag_currency
FROM fake_data
)
WHERE lag_currency IS NULL OR lag_currency != currency
ORDER BY 2;
I swapped from QUALIFY to an outer WHERE clause, because it's just seems common usage when using nested SELECTs
CURRENCY | START_DATE | RATE | END_DATE |
---|---|---|---|
EUR | 2021-12-01 | 1.25 | 2021-12-01 |
US | 2021-12-01 | 0.75 | 2021-12-01 |
EUR | 2021-12-01 | 1.25 | 2021-12-10 |
US | 2021-12-10 | 0.8 | 2021-12-25 |
EUR | 2021-12-25 | 1.11 | 2021-12-25 |
US | 2021-12-25 | 1.11 | 2022-02-02 |
Now there are some things the JOIN method allows, like aggregation of count of streak rows and values on those, but we can do that here also, we just do a couple of extra steps.
WITH fake_data(currency, start_date, rate) AS (
SELECT column1, to_date(column2, 'dd/mm/yyyy'), column3
FROM VALUES
('EUR','01/12/2021', 1.25),
('EUR','02/12/2021', 1.25),
('EUR','03/12/2021', 1.25),
('US','04/12/2021', 0.75),
('US','10/12/2021', 0.8),
('EUR','23/12/2021', 1.11),
('EUR','24/12/2021', 1.11),
('US','25/12/2021', 1.11)
)
SELECT a.*
,nvl(streak_start, lag(streak_start)ignore nulls over(order by start_date)) as streak_date
FROM (
SELECT currency
,start_date
,rate
,lag(currency) over (order by start_date) as lag_currency
,lag_currency IS NULL OR lag_currency != currency as new_row
,iff(new_row, start_date, null) as streak_start
FROM fake_data
) AS a
ORDER BY 2;
this gives a grouping token streak_date
CURRENCY | START_DATE | RATE | LAG_CURRENCY | NEW_ROW | STREAK_START | STREAK_DATE |
---|---|---|---|---|---|---|
EUR | 2021-12-01 | 1.25 | TRUE | 2021-12-01 | 2021-12-01 | |
EUR | 2021-12-02 | 1.25 | EUR | FALSE | 2021-12-01 | |
EUR | 2021-12-03 | 1.25 | EUR | FALSE | 2021-12-01 | |
US | 2021-12-04 | 0.75 | EUR | TRUE | 2021-12-04 | 2021-12-04 |
US | 2021-12-10 | 0.8 | US | FALSE | 2021-12-04 | |
EUR | 2021-12-23 | 1.11 | US | TRUE | 2021-12-23 | 2021-12-23 |
EUR | 2021-12-24 | 1.11 | EUR | FALSE | 2021-12-23 | |
US | 2021-12-25 | 1.11 | EUR | TRUE | 2021-12-25 | 2021-12-25 |
thus you can then wrap that and do any aggregation math you like:
WITH fake_data(currency, start_date, rate) AS (
SELECT column1, to_date(column2, 'dd/mm/yyyy'), column3
FROM VALUES
('EUR','01/12/2021', 1.25),
('EUR','02/12/2021', 1.26),
('EUR','03/12/2021', 1.27),
('US','04/12/2021', 0.75),
('US','10/12/2021', 0.8),
('EUR','23/12/2021', 1.11),
('EUR','24/12/2021', 1.14),
('US','25/12/2021', 1.11)
)
SELECT b.currency
,min(b.start_date) as start_date
,any_value(streak_end) as end_date
,count(*) as streak_count
,avg(rate) as avg_rate
,min(rate) as min_rate
,max(rate) as max_rate
FROM (
SELECT a.*
,nvl(streak_start, lag(streak_start)ignore nulls over(order by start_date)) as streak_date
,lead(streak_start,1,current_date) ignore nulls over (order by start_date) as streak_end
FROM (
SELECT currency
,start_date
,rate
,lag(currency) over (order by start_date) as lag_currency
,lag_currency IS NULL OR lag_currency != currency as new_row
,iff(new_row, start_date, null) as streak_start
FROM fake_data
) AS a
) as b
GROUP BY b.streak_date,1
ORDER BY 2;
CURRENCY | START_DATE | END_DATE | STREAK_COUNT | AVG_RATE | MIN_RATE |
---|---|---|---|---|---|
EUR | 2021-12-01 | 2021-12-04 | 3 | 1.25 | 1.25 |
US | 2021-12-04 | 2021-12-23 | 2 | 0.775 | 0.75 |
EUR | 2021-12-23 | 2021-12-25 | 2 | 1.11 | 1.11 |
US | 2021-12-25 | 2022-02-02 | 1 | 1.11 | 1.11 |
Upvotes: 1
Reputation: 74605
however I don't have an end date field.
It's a lot easier if you do. You can "get the one from the next row" with LEAD:
WITH exchg_from_to AS(
SELECT
Currency,
StartDate,
LEAD(StartDate, 1, TO_DATE('9999-12-31')) OVER(PARTITION BY Currency ORDER BY StartDate), CAST('9999-12-31' as date) EndDate,
Rate
FROM
exchange_rate_table
)
SELECT *
FROM
tran t
JOIN exchg_from_to e
ON
t.TranDate >= e.StartDate AND
t.TranDate < e.EndDate AND
t.Currency = e.Currency
You might have to tweak that TO_DATE a bit.. (I've never used SnowFlake, but the docs say LEAD is supported))
I don't recmmend using BETWEEN because it's inclusive both ends; if you have a tran that is bang on the date it'll match twice, once with the start date of the N row and once with the N-1 end date. Using a >=
and <
pair ensures no overlap
Upvotes: 2