Reputation: 21
SCENARIO DATA:
Row Col1 Col2 Col3 Col4 Col5
1 Bob Smith 2000-01-01 John Doe
2 Bob Smith 2010-01-01 Jane Jones
3 Ted Jones 2005-01-01 Pete Mills
4 Ted Jones 2008-01-01 John Doe
My SQL:
select col1, col2 from schema.table where col4='John' and col5='Doe'
I have no clue how to restrict my data as per need stated below.
I'm searching for where John Doe is in column 4/5 BUT I only want to retrieve the row if it's the current relationship (based on column 3) for the person in columns 1/2.
In the above data, I DO want the 4th row because John Doe is currently related to Ted Jones. But I do NOT want to retrieve the first row...because the John Doe row is NOT the current relation for Bob Smith. The current relation for Bob Smith is to Jane Jones.
IF I were searching for Pete Mills, I wouldn't want to find anything because Pete Mills is not the current relationship for Ted Jones.
Upvotes: 2
Views: 58
Reputation: 4820
You need to use something that DB2 call OLAP functions (other RDMS engines tend to call them window functions - both terms are valid). An OLAP function allows you to return analytic data about a row, e.g. what number it is in a sorted list.
For your query, you only want to return the first row for each person, so we use an OLAP function called ROW_NUMBER()
. When we use this function, we want to partition over the Col1/Col2 data (i.e. we are grouping on that data) and then reverse sort on Col3 to get the latest result. After that, we want to pick the rows that have the top value in the OLAP function, like so:
SELECT *
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Col1, Col2 ORDER BY Col3 DESC) AS rn
FROM schema.table
) AS ranked_scenarios
WHERE rn = 1
Upvotes: 3