Reputation: 2386
I'm using Postgres, but I'm fine with a generic solution to this (and you can ignore the text aggregation if you want).
With the following table setup:
id ref_id user_id start_time duration text
100 2000 1 15000 200 hello
101 2000 1 16000 300 world
102 2000 1 22000 400 foo
103 2000 2 17000 500 bar
104 2000 2 21000 600 baz
I'm trying to use the minimum integer timestamp (ms or whatever) and use that as the basis for windowing my query. The idea being that the absolute minimum across all users it the window for the conversation. So if i looked at 5000-unit windows, I'd have 2 blocks (15000-20000, 20001-25000).
Expected output:
ref_id user_id block sum(duration) count(*) text
2000 1 1 500 2 hello world
2000 1 2 400 1 foo
2000 2 1 500 1 bar
2000 2 2 600 1 baz
I've gone through multiple iterations of self join and window, but I can't get it more compact than three nested queries.
select user_id, block, string_agg(text, ' '), sum(duration)
from
(select user_id, FLOOR((start_time - t1.st)/5000) as block, start_time, text, duration
from table t0
inner join
(select id, min(start_time) as st from table group by 1) as t1
on t0.ref_id = t1.ref_id
order by 1, 2, 3) t2
group by 1, 2;
I think the reason I need the third is because I couldn't order by start_time, and it matters for some of aggregations (text concatenation, for example).
I'd love some help if anyone's an expert!
Upvotes: 1
Views: 110
Reputation: 28243
Assuming the field called id
in the example sql is a typo and should be user_id
as specified in the example data
The simplest way to write this query would be:
SELECT
user_id
, FLOOR((start_time - mst)/5000) + 1 block
, SUM(duration)
, STRING_AGG("text", ' ')
FROM mytable, (SELECT MIN(start_time) mst FROM mytable) minst
GROUP BY 1, 2
You don't need any window functions or nested queries. Here, the ,
separating the two relations in the FROM
clause performs a CROSS JOIN
(or Cartesian Product) implicitly. This is how most people start writing joins in SQL queries.
However, using ,
to implicitly mean CROSS JOIN
is often perceived as bad style, so instead the above query can be written as:
SELECT
user_id
, FLOOR((start_time - mst)/5000) + 1 block
, SUM(duration)
, STRING_AGG("text", ' ')
FROM mytable CROSS JOIN (SELECT MIN(start_time) mst FROM mytable) minst
GROUP BY 1, 2
Also, from your example sql attempt and desired output, it seems that you'll have to add 1 to FLOOR((start_time - minst)/5000)
to get the blocks to start from 1, 2, ...
Upvotes: 1
Reputation: 1269703
I'm not surer why that "block" is so important, but you can remove one level of subqueries:
select user_id, FLOOR((start_time - minstart)/5000),
string_agg(text, ' '), sum(duration)
from (select user_id, as block,
start_time, text, duration,
min(start_time) over (partition by id) as minstart
from table t0
) t
group by 1, 2;
Upvotes: 1