charles
charles

Reputation: 547

Transform a correlated subquery into a join

I want to express this:

SELECT
    a.*
    ,b.timestamp_col
FROM weird_data_source a
LEFT JOIN weird_data_source b
ON a.id = b.id
    AND b.timestamp_col = (
        SELECT
            MAX(sub.timestamp_col)
        FROM weird_data_source sub
        WHERE sub.id = a.id
            AND sub.date_col <= a.date_col
            AND sub.timestamp_col < a.timestamp_col
    )

A couple notes here about the data:

  1. date_col and timestamp_col aren't representing the same thing.
  2. I'm not kidding... the data is really structured like this.

But the subquery is invalid. Netezza cannot handle the < operator in the correlated subquery. For the life of me I cannot figure out an alternative. How could I get around this?

My gut is telling me this could potentially be done with a join, but I haven't been able to be successful at this yet.

There's a dozen or so similar questions, but none of them seem to get at handling this type of inequality.

Upvotes: 1

Views: 490

Answers (1)

Tom H
Tom H

Reputation: 47382

This should get you pretty close. You will get duplicate rows if there are two rows with the exact same timestamp_col that otherwise meet the criteria, but otherwise you should be good:

SELECT
    a.id,
    a.some_other_columns,    -- Because we NEVER use SELECT *
    b.timestamp_col
FROM
    weird_data_source a
LEFT JOIN weird_data_source b ON
    a.id = b.id
LEFT OUTER JOIN weird_data_source c ON
    c.id = a.id AND
    c.date_col <= a.date_col AND
    c.timestamp_col < a.timestamp_col
LEFT OUTER JOIN weird_data_source d ON
    d.id = a.id AND
    d.date_col <= a.date_col AND
    d.timestamp_col < a.timestamp_col AND
    d.timestamp_col > c.timestamp_col
WHERE
    d.id IS NULL

The query is basically looking for a matching row where no other matching row is found with a greater timestamp_col value - hence the d.id IS NULL. That column will only be NULL if no match is found.

Upvotes: 1

Related Questions