Reputation: 1002

Redshift: Running query using GETDATE() at specified list of times

So, I have a query that uses GETDATE() in WHERE and HAVING clauses:

SELECT GETDATE(), COUNT(*) FROM (
  SELECT 1 FROM events
  WHERE (event_time > (GETDATE() - interval '25 hours'))
  GROUP BY id
  HAVING MAX(event_time) BETWEEN (GETDATE() - interval '25 hours') AND (GETDATE() - interval '24 hours')
)

I'm basically trying to find the number of unique ids that have their latest event_time between 25 and 24 hours ago with respect to the current time.

The problem: I have another table query_dts which contains one column containing timestamps. Instead of running the above query on the current time, using GETDATE(), I need to run in on the timestamp of every entry of the query_dts table. Any ideas?

Note: I'm not really storing query_dts anywhere. I've created it like this:

WITH query_dts AS (
  SELECT (
    DATEADD(hour,-(row_number() over (order by true)), getdate())
  ) as n
  FROM events LIMIT 48
),

which I got from here

Upvotes: 2

Answers (3)

denismo

Reputation: 800

How about avoiding the generator altogether and instead just splitting the intervals:

SELECT
  dateadd(hour, -distance, getdate()),
  count(0) AS event_count
FROM (
       SELECT
         id,
         datediff(hour, max(event_time), getdate()) AS distance
       FROM events
       WHERE event_time > getdate() - INTERVAL '2 days'
       GROUP BY id) AS events_with_distance
GROUP BY distance;

Upvotes: 2

shashwat

Reputation: 1002

Here's what I ended up doing:

WITH max_time_table AS
(
  SELECT id, max(event_time) AS max_time
  FROM events
  WHERE (event_time > GETDATE() - interval '74 hours')
  GROUP BY id
),
query_dts AS
(
  SELECT (DATEADD(hour,-(row_number() over (ORDER BY TRUE) - 1), getdate()) ) AS n
  FROM events LIMIT 48
)
SELECT query_dts.n, COUNT(*)
FROM max_time_table JOIN query_dts
  ON max_time_table.max_time BETWEEN (query_dts.n - interval '25 hours') AND (query_dts.n - interval '24 hours')
GROUP BY query_dts.n
ORDER BY query_dts.n DESC

Here, I selected 74 hours because I wanted 48 hours ago + 25 hours ago = 73 hours ago.

The problem is that this isn't a general-purpose way of doing this. It's a highly specific solution for this particular problem. Can someone think of a more general way of running a query dependent on GETDATE() using a column of dates in another table?

Upvotes: 0

Gordon Linoff

Reputation: 1271151

You can use a JOIN to combine the two queries. Then you just need to substitute the values for your date expression. I think this is the logic:

WITH query_dts AS (
      SELECT DATEADD(hour, -(row_number() over (order by true)), getdate()) as n
      FROM events
      LIMIT 48
     )
SELECT d.n, COUNT(*)
FROM (SELECT d.n
      FROM events e JOIN
           query_dts d
      WHERE e.event_time > d.n
      GROUP BY id
      HAVING MAX(event_time) BETWEEN n - interval '25 hours' AND n
     ) i;

Upvotes: 0

Redshift: Running query using GETDATE() at specified list of times

Answers (3)

Related Questions