jzadeh
jzadeh

Reputation: 703

Window function trailing dates in PostgreSQL

I am trying to write a query to compare the number of average connections today for a given computer to the number of average connections between 7 and 14 days ago. I figured this was best handled by a window function but I am not able to get the syntax correct for the date.

Assume I have a table of IP addresses and connection records called iptable with soucreip, destinationip, timestamp as the columns. Here is the query I am trying for the previous 7 day window to just get the count per sourceip:

select 
  sourceip, 
  destinationip, 
  timestamp, 
  count(*) OVER (PARTITION BY sourceip order by timestamp
                 RANGE BETWEEN now() - '7 day'::Interval PRECEDING
                               now() - '14 day'::Interval FOLLOWING)
from
 iptable;

What is the best way to write this type of query does the window function approach make sense or is there a more optimized way to do things for the case of large tables?

Upvotes: 3

Views: 9798

Answers (3)

Pavel Levchenko
Pavel Levchenko

Reputation: 21

-- you can do it with filter

select 
  sourceip, 
  destinationip, 
  timestamp, 
  count(*) filter (where timestamp BETWEEN 
                  now() - '7 day'::Interval 
              and now() - '14 day'::Interval) 
           OVER (PARTITION BY sourceip order by timestamp)
from
 iptable;

Upvotes: 0

Erwin Brandstetter
Erwin Brandstetter

Reputation: 658062

To get ...

the number of average connections between 7 and 14 days ago

SELECT sourceip, destinationip, timestamp, count(*) AS ct
FROM   iptable
WHERE  "timestamp" BETWEEN now() - '14 day'::interval
                   AND     now() -  '7 day'::interval
GROUP BY 1,2,3;

Just use a plain aggregate function.
And don't use timestamp as column name. It's a protected word in the SQL standard and partly reserved n PostgreSQL.

Upvotes: 5

Craig Ringer
Craig Ringer

Reputation: 324751

Part of your issue is that you've chosen a terrible column name, "timestamp". timestamp is the name of a built-in data type, so to use it as a column name you must "double quote" it everywhere.

That's not all, though. Your window function syntax is wrong. See window function syntax. You forgot the AND; it's RANGE BETWEEN .. PRECEDING AND ... FOLLOWING.

Also, though it isn't the cause of the issues, you should use the SQL-standard current_timestamp instead of now().

That'll get you to a new error:

CREATE TABLE iptable ( sourceip cidr, destinationip cidr, "timestamp" timestamptz);

regress=> select 
  sourceip, 
  destinationip, 
  timestamp, 
  count(*) OVER (PARTITION BY sourceip order by "timestamp" RANGE BETWEEN current_timestamp - '7 day'::Interval PRECEDING AND current_timestamp - '14 day'::Interval FOLLOWING)

from
 iptable;
ERROR:  RANGE PRECEDING is only supported with UNBOUNDED
LINE 5: ... OVER (PARTITION BY sourceip order by "timestamp" RANGE BETW...
                                                             ^

which suggests that the current window function implementation won't do what you want it to do. Sadly.

The value PRECEDING and value FOLLOWING cases are currently only allowed in ROWS mode. They indicate that the frame starts or ends with the row that many rows before or after the current row. value must be an integer expression not containing any variables, aggregate functions, or window functions.

Instead I'd just use a plain GROUP BY with a WHERE filter on the input rows.

select 
  sourceip,
  count(sourceip) AS n_conns_7_to_14_days_ago
from
 iptable
WHERE age("timestamp") BETWEEN INTERVAL '7' DAY AND INTERVAL '14' DAY
GROUP BY sourceip;

Upvotes: 7

Related Questions