Reputation: 703
I am trying to write a query to compare the number of average connections today for a given computer to the number of average connections between 7 and 14 days ago. I figured this was best handled by a window function but I am not able to get the syntax correct for the date.
Assume I have a table of IP addresses and connection records called iptable with soucreip, destinationip, timestamp as the columns. Here is the query I am trying for the previous 7 day window to just get the count per sourceip:
select
sourceip,
destinationip,
timestamp,
count(*) OVER (PARTITION BY sourceip order by timestamp
RANGE BETWEEN now() - '7 day'::Interval PRECEDING
now() - '14 day'::Interval FOLLOWING)
from
iptable;
What is the best way to write this type of query does the window function approach make sense or is there a more optimized way to do things for the case of large tables?
Upvotes: 3
Views: 9798
Reputation: 21
-- you can do it with filter
select
sourceip,
destinationip,
timestamp,
count(*) filter (where timestamp BETWEEN
now() - '7 day'::Interval
and now() - '14 day'::Interval)
OVER (PARTITION BY sourceip order by timestamp)
from
iptable;
Upvotes: 0
Reputation: 658062
To get ...
the number of average connections between 7 and 14 days ago
SELECT sourceip, destinationip, timestamp, count(*) AS ct
FROM iptable
WHERE "timestamp" BETWEEN now() - '14 day'::interval
AND now() - '7 day'::interval
GROUP BY 1,2,3;
Just use a plain aggregate function.
And don't use timestamp
as column name. It's a protected word in the SQL standard and partly reserved n PostgreSQL.
Upvotes: 5
Reputation: 324751
Part of your issue is that you've chosen a terrible column name, "timestamp"
. timestamp
is the name of a built-in data type, so to use it as a column name you must "double quote"
it everywhere.
That's not all, though. Your window function syntax is wrong. See window function syntax. You forgot the AND
; it's RANGE BETWEEN .. PRECEDING AND ... FOLLOWING
.
Also, though it isn't the cause of the issues, you should use the SQL-standard current_timestamp
instead of now()
.
That'll get you to a new error:
CREATE TABLE iptable ( sourceip cidr, destinationip cidr, "timestamp" timestamptz);
regress=> select
sourceip,
destinationip,
timestamp,
count(*) OVER (PARTITION BY sourceip order by "timestamp" RANGE BETWEEN current_timestamp - '7 day'::Interval PRECEDING AND current_timestamp - '14 day'::Interval FOLLOWING)
from
iptable;
ERROR: RANGE PRECEDING is only supported with UNBOUNDED
LINE 5: ... OVER (PARTITION BY sourceip order by "timestamp" RANGE BETW...
^
which suggests that the current window function implementation won't do what you want it to do. Sadly.
The value PRECEDING and value FOLLOWING cases are currently only allowed in ROWS mode. They indicate that the frame starts or ends with the row that many rows before or after the current row. value must be an integer expression not containing any variables, aggregate functions, or window functions.
Instead I'd just use a plain GROUP BY
with a WHERE
filter on the input rows.
select
sourceip,
count(sourceip) AS n_conns_7_to_14_days_ago
from
iptable
WHERE age("timestamp") BETWEEN INTERVAL '7' DAY AND INTERVAL '14' DAY
GROUP BY sourceip;
Upvotes: 7