Reputation: 31
There is a table (mileage_event) which shows mileage of all cars. Contains the following columns: created_at - date, car_id - car id, car_mileage - mileage of a car
It is worth considering that not all days have data. For example:
car_id | created_at | car_mileage |
---|---|---|
d016e61c5326-f246894a | 2023-05-22 | 100000 |
d016e61c5326-f246894a | 2023-05-26 | 105000 |
d016e61c5326-f246894a | 2023-05-27 | 110000 |
As you can see, there are no rows for this car between 2023-05-22 and 2023-05-26 (i.e. 3 days are absent). It means that the car hadn't been on drive for these days.
What do I want: to generate the missing days and add the nearest mileage value to them.
(Note: it's Exasol database)
As a result:
car_id | created_at | car_mileage |
---|---|---|
d016e61c5326-f246894a | 2023-05-22 | 100000 |
d016e61c5326-f246894a | 2023-05-23 | 100000 |
d016e61c5326-f246894a | 2023-05-24 | 100000 |
d016e61c5326-f246894a | 2023-05-25 | 100000 |
d016e61c5326-f246894a | 2023-05-26 | 105000 |
d016e61c5326-f246894a | 2023-05-27 | 110000 |
Thanks in advance!
UPD: I could be able to come up with this but it seems I'm wrong somewhere and do not see an error:
WITH date_series AS (select add_days(current_date - interval '2' year, level - 1) as dates
from dual
connect by level <= days_between(current_date,current_date - interval '2' year) + 1
order by local.dates) --to constantly get only last 2 years
,
combined AS (
SELECT
m.car_id as car_id,
d.dates,
COALESCE(m.car_mileage,
LAG(m.car_mileage, 1) OVER (PARTITION BY m.car_id ORDER BY d.dates)) AS car_mileage
FROM date_series d
LEFT JOIN mileage_event m ON d.dates = m.created_at
)
SELECT * FROM combined ORDER BY car_id, dates
And still I get this: | car_id | created_at | car_mileage | | --- | --- | --- | | d016e61c5326-f246894a | 2023-05-22 | 100000 | | null | 2023-05-23 | null | | null | 2023-05-24 | null | | null | 2023-05-25 | null | | d016e61c5326-f246894a | 2023-05-26 | 105000 | | d016e61c5326-f246894a | 2023-05-27 | 110000 |
I get that it's left join so it obviously cannot find the right values (and shows null instead).
Upvotes: 0
Views: 105
Reputation: 222
Ah, always good to see someone asking a legitimate question and people commenting "You should google it".
You can achieve this by leveraging common table expressions (CTEs), window functions, and joining with a series of dates. The general strategy is to:
Generate a series of dates spaning the range you're interested in. For each car_id, use a LEFT JOIN to combine this date series with your existing mileage data. Employ window functions to fill in the missing mileage data. Here is how i would do it:
-- Create a CTE with a series of dates. Adjust the start and end dates as needed.
WITH date_series AS (
SELECT CAST('2023-05-22' AS DATE) + ROW_NUMBER() OVER() - 1 AS created_at
FROM (SELECT 1 FROM DUAL CONNECT BY LEVEL <= 30) -- Generating 30 days, adjust this number as needed
),
-- LEFT JOIN the date series with your mileage data and use LAG() to fill in gaps.
combined AS (
SELECT
m.car_id,
d.created_at,
COALESCE(m.car_mileage,
LAG(m.car_mileage, 1) OVER (PARTITION BY m.car_id ORDER BY d.created_at)) AS car_mileage
FROM date_series d
LEFT JOIN mileage_event m ON d.created_at = m.created_at
)
-- Get result
SELECT * FROM combined WHERE car_id IS NOT NULL ORDER BY car_id, created_at
I hope that answer your question and works. Let me know if you need more help.
UPDATE
The issue is probably the LAG function in the combined CTE. When you are using the LAG function within the LEFT JOIN context, it will not have a car_id value to partition by for the rows that have a null car_mileage. This is why the LAG function doesn't return a value for these rows.
WITH date_series
AS (
SELECT add_days(CURRENT_DATE - interval '2' year, LEVEL - 1) AS dates
FROM dual CONNECT BY LEVEL <= days_between(CURRENT_DATE, CURRENT_DATE - interval '2' year) + 1
ORDER BY dates
)
,joined
AS (
SELECT m.car_id AS car_id
,d.dates
,m.car_mileage
FROM date_series d
LEFT JOIN mileage_event m ON d.dates = m.created_at
)
SELECT car_id
,dates AS created_at
,COALESCE(car_mileage, LAG(car_mileage, 1) OVER (
PARTITION BY car_id ORDER BY dates
)) AS car_mileage
FROM joined
ORDER BY car_id
,created_at;
Upvotes: 1
Reputation: 94884
I don't know Exasol syntax. In SQL you'll usually fill gaps in a series with a recursive query. All you'll do is take each row, look at their following row, if there is any, so you know how many days to add. Then add as many rows as you need.
The following is Oracle syntax. In Exasol you may have to add the keyword RECURSIVE
or change the interval to something like interval '1 days'
or the like. You can probably find this in the docs.
with cte (car_id, created_at, car_mileage, next_created_at) as
(
select
car_id, created_at, car_mileage,
lead(created_at) over (partition by car_id order by created_at) as next_created_at
from mytable
union all
select car_id, created_at + interval '1' day, car_mileage, next_created_at
from cte
where created_at + interval '1' day < next_created_at
)
select car_id, created_at, car_mileage
from cte
order by car_id, created_at;
Demo: https://dbfiddle.uk/61_Ht136
Upvotes: 0