Reputation: 79
I am having trouble querying some data. The table I am trying to pull the data from is a LOG table, where I would like to see changes in the values next to each other (example below)
Table:
+-----------+----+-------------+----------+------------+ | UNIQUE_ID | ID | NAME | CITY | DATE | +-----------+----+-------------+----------+------------+ | xa220 | 1 | John Smith | Berlin | 2020.05.01 | | xa195 | 1 | John Smith | Berlin | 2020.03.01 | | xa111 | 1 | John Smith | München | 2020.01.01 | | xa106 | 2 | James Brown | Atlanta | 2018.04.04 | | xa100 | 2 | James Brown | Boston | 2017.12.10 | | xa76 | 3 | Emily Wolf | Shanghai | 2016.11.03 | | xa20 | 3 | Emily Wolf | Shanghai | 2016.07.03 | | xa15 | 3 | Emily Wolf | Tokyo | 2014.02.22 | | xa12 | 3 | Emily Wolf | null | 2014.02.22 | +-----------+----+-------------+----------+------------+
Desired outcome:
+----+-------------+----------+---------------+ | ID | NAME | CITY | PREVIOUS_CITY | +----+-------------+----------+---------------+ | 1 | John Smith | Berlin | München | | 2 | James Brown | Atlanta | Boston | | 3 | Emily Wolf | Shanghai | Tokyo | | 3 | Emily Wolf | Tokyo | null | +----+-------------+----------+---------------+
I have been trying to use FIRST and LAST values, however, cannot get the desired outcome.
select distinct id, name, city, first_value(city) over (partition by id order by city) as previous_city from test
Any help is appreciated! Thank you!
Upvotes: 1
Views: 578
Reputation: 521239
Use the LAG
function to get the city for previous date and display only the rows where current city and the result of lag are different:
WITH cte AS (
SELECT t.*, LAG(CITY, 1, CITY) OVER (PARTITION BY ID ORDER BY "DATE") LAG_CITY
FROM yourTable t
)
SELECT ID, NAME, CITY, LAG_CITY AS PREVIOUS_CITY
FROM cte
WHERE
CITY <> LAG_CITY OR
CITY IS NULL AND LAG_CITY IS NOT NULL OR
CITY IS NOT NULL AND LAG_CITY IS NULL
ORDER BY
ID, "DATE" DESC;
Some comments on how LAG
is being used and its values checked are warranted. We use the three parameter version of LAG
here. The second parameter means the number of records to look back, which in this case is 1 (the default). The third parameter means the default value to use should a given record per ID
partition be the first. In this case, we use the default as the same CITY
value. This means that the first record would never appear in the result set.
For the WHERE
clause above, a matching record is one for which the city and lag city are different, or for where one of the two be NULL
and the other not NULL
. This is the logic needed to treat a NULL
city and some not NULL
city value as being different.
Upvotes: 2