Maxi Hui
Maxi Hui

Reputation: 49

Querying previous row of data if data is not available for today

Context:

I am querying daily foreign exchange rate. Currently, we get the info via an API that only runs starting at a specific time of day. However, I would like to be able to show the exchange rate even before the data is available.

So, for example, let's say the API grabs the data at 7am local time, meaning that if I want to query the foreign exchange rate before 7am, nothing would show since the backend script hasn't begun loading the data into the database. Since I have no access to backend, I want to use a workaround using SQL to retrieve yesterday's rate instead until today's rate is available.

Below is how the table looks:

Table: forex_daily_rate
ID              DATE            SOURCE  TARGET      RATE
20210531AUDUSD  2021-05-31      AUD     USD         0.772797527
20210531CADUSD  2021-05-31      CAD     USD         0.8280887711
20210531EURUSD  2021-05-31      EUR     USD         1.219066195
20210531GBPUSD  2021-05-31      GBP     USD         1.418238548
20210530AUDUSD  2021-05-30      AUD     USD         0.7714858818
20210530CADUSD  2021-05-30      CAD     USD         0.8287750704
20210530EURUSD  2021-05-30      EUR     USD         1.219363492
20210530GBPUSD  2021-05-30      GBP     USD         1.420252805

Notes:

I have tried the query below:

WITH _base as (
SELECT * FROM forex_daily_rates
)

SELECT B.id, B.date, B.source, B.target,
CASE WHEN max(B.date) != current_date THEN lag(rate) OVER (PARTITION BY target,source ORDER BY B.date DESC)
    ELSE rate
END as rate
FROM _base B

Even though there is no syntax error, it still does not display the rate for 2021-06-01 (assuming of course that 2021-06-01's data has not been loaded into the database yet).

The expected end result, given that 2021-06-01 is not available yet, should be:

ID              DATE            SOURCE  TARGET      RATE
20210601AUDUSD  2021-06-01      AUD     USD         0.772797527
20210601CADUSD  2021-06-01      CAD     USD         0.8280887711
20210601EURUSD  2021-06-01      EUR     USD         1.219066195
20210601GBPUSD  2021-06-01      GBP     USD         1.418238548
20210531AUDUSD  2021-05-31      AUD     USD         0.772797527
20210531CADUSD  2021-05-31      CAD     USD         0.8280887711
20210531EURUSD  2021-05-31      EUR     USD         1.219066195
20210531GBPUSD  2021-05-31      GBP     USD         1.418238548

I have thought maybe because the date does not exist in the table yet, therefore there's nothing for the query to append the value too. I have tried creating a temporary table which generates list of dates starting from 2017-01-01 called calendar, but still didn't work.

WITH calendar AS (
  SELECT date_trunc('day', dd)::date AS date
     FROM generate_series ('2017-01-01'::TIMESTAMP, CURRENT_DATE , '1 day'::interval) dd
    ),
    
    _base as (
    SELECT * FROM forex_daily_rates
    )

SELECT B.id, C.date, B.source, B.target,
CASE WHEN max(C.date) != current_date THEN lag(rate) OVER (PARTITION BY target,source ORDER BY C.date DESC)
    ELSE rate
END as rate
FROM calendar C
LEFT JOIN _base B ON C.date = B.date
GROUP BY 1,3,4, C.date
ORDER BY 2 DESC

Would appreciate the help, whether in Postgres and Snowflake syntax is fine! Many thanks!

Upvotes: 1

Views: 195

Answers (2)

Simon D
Simon D

Reputation: 6229

I think the following should work. I tried to avoid using a union so that it performs better if you had a big table.

Create table with sample data:

create or replace transient table test_table as (
    select
        column1 as id,
        column2 as date,
        column3 as source,
        column4 as target,
        column5 as rate
    from (values ('20210602AUDUSD', '2021-06-02'::date, 'AUD', 'USD', 0.11111),
                 ('20210602CADUSD', '2021-06-02'::date, 'CAD', 'USD', 0.11111),
                 ('20210601AUDUSD', '2021-06-01'::date, 'AUD', 'USD', 0.22222),
                 ('20210601CADUSD', '2021-06-01'::date, 'CAD', 'USD', 0.22222),
                 ('20210601EURUSD', '2021-06-01'::date, 'EUR', 'USD', 0.22222),
                 ('20210601GBPUSD', '2021-06-01'::date, 'GBP', 'USD', 0.22222)
             )
);

Query

with latest_rates as (
    select
        to_char(current_date, 'YYYYMMDD') || source || target as id,
        current_date                                          as date,
        source,
        target,
        rate
    from test_table
        qualify row_number() over (partition by source, target order by date desc) = 1
)
select * from latest_rates
union all
select * from test_table where id not in (select id from latest_rates)
;

Results

+--------------+----------+------+------+-------+
|ID            |DATE      |SOURCE|TARGET|RATE   |
+--------------+----------+------+------+-------+
|20210602AUDUSD|2021-06-02|AUD   |USD   |0.11111|
|20210602CADUSD|2021-06-02|CAD   |USD   |0.11111|
|20210602EURUSD|2021-06-02|EUR   |USD   |0.22222|
|20210602GBPUSD|2021-06-02|GBP   |USD   |0.22222|
|20210601AUDUSD|2021-06-01|AUD   |USD   |0.22222|
|20210601CADUSD|2021-06-01|CAD   |USD   |0.22222|
|20210601EURUSD|2021-06-01|EUR   |USD   |0.22222|
|20210601GBPUSD|2021-06-01|GBP   |USD   |0.22222|
+--------------+----------+------+------+-------+

Upvotes: 1

Mike Walton
Mike Walton

Reputation: 7339

If you are only ever returning the most recent data (which I assume since your case statement is only evaluating current_date, couldn't you just use a window function and choose the most recent record?

For example:

SELECT B.id, B.date, B.source, B.target, B.rate
FROM _base B
QUALIFY row_number() OVER (PARTITION BY target,source ORDER BY B.date DESC) = 1

Upvotes: 0

Related Questions