jolu10
jolu10

Reputation: 13

HANA SQL Filling missing gaps in a date table with balance history

On Hana Sql environment I have this table with changes of balances from customers accounts by dates:

"BalanceTable"

CustomerID BalDate Balance
1 2021-06-01 0
1 2021-06-04 100
1 2021-06-28 500
2 2021-06-01 200
2 2021-06-03 0
2 2021-07-02 300
...

The table has several rows.

I have created now a date table with all the dates of the interval using the earliest day as first row and latest day as last row:

"DateTable"

Day
2021-06-01
2021-06-02
2021-06-03
2021-06-04
2021-06-05
2021-06-06
...
2021-07-02

I need to join the two tables having the daily balance of each customer:

Day CustomerID Balance
2021-06-01 1 0
2021-06-02 1 0
2021-06-03 1 0
2021-06-04 1 100
2021-06-05 1 100
2021-06-06 1 100
...
2021-06-27 1 100
2021-06-28 1 500
2021-06-29 1 500
2021-06-30 1 500
2021-07-01 1 500
2021-07-02 1 100
2021-06-01 2 200
2021-06-02 2 200
2021-06-03 2 0
2021-06-04 2 0
2021-06-05 2 0
...
2021-06-30 2 0
2021-07-01 2 0
2021-07-02 2 300

As first aproach I have tried joining the two tables using a left join:

SELECT * FROM "DateTable" T0 LEFT JOIN "BalanceTable" T1 ON T0."Day"=T1."BalDate"

But I know the proper solution is far beyond my limited SQL knowledge. The key is being able to fill in the gaps for the days of the "DateTable" that don't have a balance value in the "BalanceTable" with the balance of the previous last day with data.

I've read similar cases and they combine IFNULL function to fill gaps with PARTITION BY clause to get the last value, but after many attempts I wasn't able to apply that to my case.

Thank you for your ideas and sorry if I miss something, this is my first post asking for help.

Upvotes: 1

Views: 833

Answers (1)

Mathias Kemeter
Mathias Kemeter

Reputation: 1183

So you have this example data:

CREATE TABLE BALANCETAB (CUSTOMERID INTEGER, BALDATE DATE, BALANCE INTEGER);

INSERT INTO BALANCETAB VALUES (1, '2021-06-01', 0);
INSERT INTO BALANCETAB VALUES (1, '2021-06-04', 100);
INSERT INTO BALANCETAB VALUES (1, '2021-06-28', 500);
INSERT INTO BALANCETAB VALUES (2, '2021-06-01', 200);
INSERT INTO BALANCETAB VALUES (2, '2021-06-03', 0);
INSERT INTO BALANCETAB VALUES (1, '2021-07-02', 300);

You already headed in the right direction by creating the dates table:

CREATE TABLE DATETAB AS (
    SELECT GENERATED_PERIOD_START DAY 
    FROM SERIES_GENERATE_DATE('INTERVAL 1 DAY', '2021-06-01' ,'2021-07-02')
);

However, additionally you will need to know all customers since you want to add one row per date and per customer (cross join):

CREATE TABLE CUSTOMERTAB AS (
    SELECT DISTINCT CUSTOMERID FROM BALANCETAB
);

From this you can infer the table with NULL values, that you would like to fill:

WITH DATECUSTOMERTAB AS (
    SELECT * FROM DATETAB, CUSTOMERTAB
)
SELECT DCT.DAY, DCT.CUSTOMERID, BT.BALANCE
FROM DATECUSTOMERTAB DCT
LEFT JOIN BALANCETAB BT ON DCT.DAY = BT.BALDATE AND DCT.CUSTOMERID = BT.CUSTOMERID
ORDER BY DCT.CUSTOMERID, DCT.DAY;

On this table, you can apply a self-join (BTFILL) and use window function RANK (documentation) to determine the latest previous balance value.

WITH DATECUSTOMERTAB AS (
    SELECT * FROM DATETAB, CUSTOMERTAB
)
SELECT DAY, CUSTOMERID, IFNULL(BALANCE, BALANCEFILL) BALANCE_FILLED
FROM
(
    SELECT DCT.DAY, DCT.CUSTOMERID, BT.BALANCE, BTFILL.BALANCE AS BALANCEFILL,
        RANK() OVER (PARTITION BY DCT.DAY, DCT.CUSTOMERID, BT.BALANCE ORDER BY BTFILL.BALDATE DESC) RNK
    FROM DATECUSTOMERTAB DCT
    LEFT JOIN BALANCETAB BT ON DCT.DAY = BT.BALDATE AND DCT.CUSTOMERID = BT.CUSTOMERID
    LEFT JOIN BALANCETAB BTFILL ON BTFILL.BALDATE <= DCT.DAY AND DCT.CUSTOMERID = BTFILL.CUSTOMERID AND BTFILL.BALANCE IS NOT NULL
)
WHERE RNK = 1
ORDER BY CUSTOMERID, DAY;

Of course, you would omit the explicit creation of tables DATETAB and CUSTOMERTAB. The list of expected customer would probably already exist somewhere in your system and the series generator function could be part of the final statement.

Upvotes: 1

Related Questions