user12533955
user12533955

Reputation: 559

How can I aggregate a count of rows by time intervals in PostgreSQL?

If I have a PostgreSQL table that has columns for datetime and for an array of items, such as:

| time                       | items                  |
| -------------------------- | ---------------------- |
| 2020-12-06 11:31:38.000    |  {item1, item2}        |
| 2020-12-06 11:48:11.304    |  {item1}               |
| 2020-12-06 11:48:48.654    |  {item1, item2, item3} |
| 2020-12-06 11:49:50.355    |  {item2}               |
| 2020-12-06 11:55:31.842    |  {item1, item2}        |

How can I query the table to aggregate the count of a specific item in equidistant time intervals?

For example, I'd like to count the occurrences of item1 in 5 minute intervals, so that the query result looks like this:

| start_time                 | end_time                            | item1 count     |
| -------------------------- | ----------------------------------- | --------------- |
| 2020-12-06 11:30:00.000    |  2020-12-06 11:34:99.999            |       1         |
| 2020-12-06 11:35:00.000    |  2020-12-06 11:39:99.999            |       0         |
| 2020-12-06 11:40:00.000    |  2020-12-06 11:44:99.999            |       0         |
| 2020-12-06 11:45:00.000    |  2020-12-06 11:49:99.999            |       2         |
| 2020-12-06 11:50:00.000    |  2020-12-06 11:54:99.999            |       0         |
| 2020-12-06 11:55:00.000    |  2020-12-06 11:59:99.999            |       1         |

I'm having a tough time trying to figure out what query can help me achieve this in the most optimal way. I've been thinking that Postgres' date_trunc or grid might help with this, but am really not sure how to approach the problem. Any suggestions?

Upvotes: 2

Views: 1910

Answers (2)

Nick S
Nick S

Reputation: 73

For postgres >= 14 you should probably use date_bin. As for the 'item1', if you don't care for it possibly being listed multiple times in a single row and just want to count rows where it's present, a simple filter will suffice:

select 
    date_bin('5 minutes', time)
    , count(*) filter (where 'item1' in any(items))
from table
group by 1

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269445

You can use generate_series() to generate the timestamps. Then unnest, filter and aggregate:

select gs.ts, count(i.time) as num_item1
from generate_series('2020-12-06 11:30:00.000'::timestamp, '2020-12-06 11:55:00.000', interval '5 minute') gs(ts) left join
     (items i join lateral
      unnest(i.items) item
      on item = 'item1'
     )
     on i.time >= gs.ts and i.time < gs.ts + interval '5 minute'
group by gs.ts
order by 1;

Here is a db<>fiddle.

Upvotes: 1

Related Questions