Any way to clean up this complex aggregate function?

Question

I'm using Postgres, but I'm fine with a generic solution to this (and you can ignore the text aggregation if you want).

With the following table setup:

id  ref_id  user_id  start_time  duration text
100 2000    1        15000       200      hello
101 2000    1        16000       300      world
102 2000    1        22000       400      foo
103 2000    2        17000       500      bar
104 2000    2        21000       600      baz

I'm trying to use the minimum integer timestamp (ms or whatever) and use that as the basis for windowing my query. The idea being that the absolute minimum across all users it the window for the conversation. So if i looked at 5000-unit windows, I'd have 2 blocks (15000-20000, 20001-25000).

Expected output:

ref_id  user_id  block  sum(duration) count(*) text
2000    1        1      500           2        hello world
2000    1        2      400           1        foo
2000    2        1      500           1        bar
2000    2        2      600           1        baz

I've gone through multiple iterations of self join and window, but I can't get it more compact than three nested queries.

select user_id, block, string_agg(text, ' '), sum(duration)
from 
    (select user_id, FLOOR((start_time - t1.st)/5000) as block, start_time, text, duration
    from table t0
    inner join 
        (select id, min(start_time) as st from table group by 1) as t1
        on t0.ref_id = t1.ref_id
    order by 1, 2, 3) t2
group by 1, 2;

I think the reason I need the third is because I couldn't order by start_time, and it matters for some of aggregations (text concatenation, for example).

I'd love some help if anyone's an expert!

Gordon Linoff · Accepted Answer

I'm not surer why that "block" is so important, but you can remove one level of subqueries:

select user_id, FLOOR((start_time - minstart)/5000),
       string_agg(text, ' '), sum(duration)
from (select user_id,  as block, 
             start_time, text, duration,
             min(start_time) over (partition by id) as minstart
      from table t0
      ) t
group by 1, 2;

Any way to clean up this complex aggregate function?

Answers (2)

Related Questions