Rich
Rich

Reputation: 21

Using DB2 SQL, how can I avoid returning rows that aren't the most recent, for a condition that is NOT part of my where clause?

SCENARIO DATA:

Row   Col1    Col2    Col3         Col4    Col5
1     Bob     Smith   2000-01-01   John    Doe
2     Bob     Smith   2010-01-01   Jane    Jones
3     Ted     Jones   2005-01-01   Pete    Mills
4     Ted     Jones   2008-01-01   John    Doe

My SQL:

select col1, col2 from schema.table where col4='John' and col5='Doe'

I have no clue how to restrict my data as per need stated below.

I'm searching for where John Doe is in column 4/5 BUT I only want to retrieve the row if it's the current relationship (based on column 3) for the person in columns 1/2.

In the above data, I DO want the 4th row because John Doe is currently related to Ted Jones. But I do NOT want to retrieve the first row...because the John Doe row is NOT the current relation for Bob Smith. The current relation for Bob Smith is to Jane Jones.

IF I were searching for Pete Mills, I wouldn't want to find anything because Pete Mills is not the current relationship for Ted Jones.

Upvotes: 2

Views: 58

Answers (1)

e_i_pi
e_i_pi

Reputation: 4820

You need to use something that DB2 call OLAP functions (other RDMS engines tend to call them window functions - both terms are valid). An OLAP function allows you to return analytic data about a row, e.g. what number it is in a sorted list.

For your query, you only want to return the first row for each person, so we use an OLAP function called ROW_NUMBER(). When we use this function, we want to partition over the Col1/Col2 data (i.e. we are grouping on that data) and then reverse sort on Col3 to get the latest result. After that, we want to pick the rows that have the top value in the OLAP function, like so:

SELECT *
FROM (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY Col1, Col2 ORDER BY Col3 DESC) AS rn
    FROM schema.table
) AS ranked_scenarios
WHERE rn = 1

Upvotes: 3

Related Questions