m0meni
m0meni

Reputation: 16435

PostgreSQL getting daily, weekly, and monthly averages of the occurrences of an event in one query

Currently I have this rather large query that works by

  1. Aggregating the daily, weekly, monthly counts into intermediate tables by taking the count() of an event grouped by the event name and the date.
  2. Selecting the avg count over each intermediate table by doing avg() group by just event, doing a union of the results, and because I want to have a separate column for daily, weekly, monthly, putting a filler value of 0 into empty columns.
  3. I then sum over all the columns, and the 0s basically act as a no-op, which gives me just a single value for each event.

The query is pretty large though, and I feel like I'm doing a lot of repetitive work. Is there any way to do this query better or make it smaller? I haven't really done queries like this before so I'm not quite sure.

WITH monthly_counts as (
  SELECT
    event,
    count(*) as count
  FROM tracking_stuff
  WHERE
    event = 'thing'
    OR event = 'thing2'
    OR event = 'thing3'
  GROUP BY event, date_trunc('month', created_at)
),
weekly_counts as (
  SELECT
    event,
    count(*) as count
  FROM tracking_stuff
  WHERE
    event = 'thing'
    OR event = 'thing2'
    OR event = 'thing3'
  GROUP BY event, date_trunc('week', created_at)
),
daily_counts as (
  SELECT
    event,
    count(*) as count
  FROM tracking_stuff
  WHERE
    event = 'thing'
    OR event = 'thing2'
    OR event = 'thing3'
  GROUP BY event, date_trunc('day', created_at)
),
query as (
  SELECT
    event,
    0 as daily_avg,
    0 as weekly_avg,
    avg(count) as monthly_avg
  FROM monthly_counts
  GROUP BY event
  UNION
  SELECT
    event,
    0 as daily_avg,
    avg(count) as weekly_avg,
    0 as monthly_avg
  FROM weekly_counts
  GROUP BY event
  UNION
  SELECT
    event,
    avg(count) as daily_avg,
    0 as weekly_avg,
    0 as monthly_avg
  FROM daily_counts
  GROUP BY event
)
SELECT
  event,
  sum(daily_avg) as daily_avg,
  sum(weekly_avg) as weekly_avg,
  sum(monthly_avg) as monthly_avg
FROM query
GROUP BY event;

Upvotes: 11

Views: 19478

Answers (2)

klin
klin

Reputation: 121604

I'd write the query in a way like this:

select event, daily_avg, weekly_avg, monthly_avg
from (
    select event, avg(count) monthly_avg
    from (
        select event, count(*)
        from tracking_stuff
        where event in ('thing1', 'thing2', 'thing3')
        group by event, date_trunc('month', created_at)
    ) s
    group by 1
) monthly
join (
    select event, avg(count) weekly_avg
    from (
        select event, count(*)
        from tracking_stuff
        where event in ('thing1', 'thing2', 'thing3')
        group by event, date_trunc('week', created_at)
    ) s
    group by 1
) weekly using(event)
join (
    select event, avg(count) daily_avg
    from (
        select event, count(*)
        from tracking_stuff
        where event in ('thing1', 'thing2', 'thing3')
        group by event, date_trunc('day', created_at)
    ) s
    group by 1
) daily using(event)
order by 1;

If the where condition eliminates a significant portion of the data (say more than a half) the use of cte could slightly speed up the query execution:

with the_data as (
    select event, created_at
    from tracking_stuff
    where event in ('thing1', 'thing2', 'thing3')
    )

select event, daily_avg, weekly_avg, monthly_avg
from (
    select event, avg(count) monthly_avg
    from (
        select event, count(*)
        from the_data
        group by event, date_trunc('month', created_at)
    ) s
    group by 1
) monthly
--  etc ... 

Just out of curiosity, I've done a test on the data:

create table tracking_stuff (event text, created_at timestamp);
insert into tracking_stuff
    select 'thing' || random_int(9), '2016-01-01'::date+ random_int(365)
    from generate_series(1, 1000000);

In every query I've replaced thing with thing1, so the queries eliminate about 2/3 of rows.

Average execution time of 10 tests:

    Original query          1106 ms
    My query without cte    1077 ms
    My query with cte        902 ms
    Clodoaldo's query       5187 ms

Upvotes: 10

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125214

In 9.5+ use grouping sets

The data selected by the FROM and WHERE clauses is grouped separately by each specified grouping set, aggregates computed for each group just as for simple GROUP BY clauses, and then the results returned

select event,
    avg(total) filter (where day is not null) as avg_day,
    avg(total) filter (where week is not null) as avg_week,
    avg(total) filter (where month is not null) as avg_month    
from (
    select
        event,
        date_trunc('day', created_at) as day,
        date_trunc('week', created_at) as week,
        date_trunc('month', created_at) as month,
        count(*) as total
    from tracking_stuff
    where event in ('thing','thing2','thing3')
    group by grouping sets ((event, 2), (event, 3), (event, 4))
) s
group by event

To learn more about grouping sets, consider these tutorials: one, two

Upvotes: 10

Related Questions