Reputation: 353
I have 3 tables, users, trips and tripDetails. When the user creates a trip a row with the following fields is created in the trips table:
id(INT), user_id(INT) & dateCreated(DATE)
and 4 rows are created in the tripDetails table:
id(INT), trip_id(INT), field(VARCHAR) & value(VARCHAR)
where field is either 'smartbox.destination', 'smartbox.dateFrom', 'smartbox.dateTo', 'smartbox.numberOfPeople' and the value column is a varchar. However, let's say the user changes the destination and saves this change, a new record is created in the trips table and ONLY ONE record is created in the tripDetails table (the updated destination)
Now I want to create a select which would give me a snapshot of a user's trip for a given day with the column headers:
user_id, trip_id, destination, dateFrom, dateTo, numberOfPeople, givenDay(DATE)
such that if one field had been changed on that given day, all the other columns would show what their most recent value was, relative to that day.
I've set up a sqlfiddle here
Upvotes: 1
Views: 253
Reputation: 4141
First of all, allow me to say: You have some seriously flawed data model with that "overriding key/value pair" way of dealing with data.
And now for the solution to your problem. Assuming that
tripDetails.value
column is declared not null
,givenDate
,yyyy-mm-dd
,your query could look like
with pivot$ as (
select
U.id as user_id, T.id as trip_id, max(T.dateCreated) as trip_date,
max(decode(TD.field, 'smartbox.destination', TD.value)) as trip_destination,
max(decode(TD.field, 'smartbox.dateFrom', TD.value)) as trip_date_from,
max(decode(TD.field, 'smartbox.dateTo', TD.value)) as trip_date_to,
max(decode(TD.field, 'smartbox.numberOfPeople', TD.value)) as trip_no_of_people
from users U
join trips T
on T.user_id = U.id
join tripDetails TD
on TD.trip_id = T.id
and TD.field in ('smartbox.destination', 'smartbox.dateFrom', 'smartbox.dateTo', 'smartbox.numberOfPeople')
where T.dateCreated <= date'&givenDate'
group by U.id, T.id
),
resolve_versioning$ as (
select user_id, trip_id, trip_date,
first_value(trip_destination) ignore nulls over (partition by user_id order by trip_date desc rows between current row and unbounded following) as trip_destination,
first_value(trip_date_from) ignore nulls over (partition by user_id order by trip_date desc rows between current row and unbounded following) as trip_date_from,
first_value(trip_date_to) ignore nulls over (partition by user_id order by trip_date desc rows between current row and unbounded following) as trip_date_to,
first_value(trip_no_of_people) ignore nulls over (partition by user_id order by trip_date desc rows between current row and unbounded following) as trip_no_of_people,
row_number() over (partition by user_id order by trip_date desc) as relevance$
from pivot$
)
select user_id, trip_id,
trip_destination, trip_date_from, trip_date_to, trip_no_of_people,
date'&givenDate' as given_date
from resolve_versioning$
where relevance$ <= 1
;
This, in three steps, does:
pivot$
subquery denormalizes your key/value pairs to wider rows, with trip_id
as the dataset's logical primary key, effectively leaving columns NULL
when there's no key/value pair for that trip_id
. (btw., this is where the non-nullability of the tripDetails.value
column is crucial for success of the query)resolve_versioning$
subquery takes advantage of the first_value()
analytic function, working over each individual trip detail of all trips for a user (partition by user_id
), finding the first (first_value
) non-NULL (ignore nulls
) value of the respective trip detail, searching from the "youngest" trip date back to older days (order by trip_date desc
) ... Or, if you look at it the other way round, it looks for the last non-NULL value of the trip detail in an ordering by the trip date.rows between current row and unbounded following
is a kind of "magic", necessary to correctly process the window for the particular analytical order by
. (Read here for an explanation.)row_number() over (partition by user_id order by trip_date desc)
simply numbers all resulting rows from 1
upwards, where 1
is assigned to the "youngest" row in the ordering by the trip date. Then, in the outermost select, the whole result is filtered to show only the youngest row (relevance$ <= 1
).Enjoy!
Upvotes: 1