Tyler Brock
Tyler Brock

Reputation: 30136

Better way to do distinct rollup in redshift?

What is the best way to write a query that does a rollup of distinct counts over various discrete time ranges on redshift?

For example, if you have some phone numbers collected for leads in various organizations and want to find out how many distinct phone numbers were created weekly, monthly, quarterly, etc... what is the best way to do it?

This is the best I could come up with:

SELECT
  organization,
  sum(weekly) as weekly,
  sum(monthly) as monthly,
  sum(quarterly) as quarterly,
  sum(yearly) as yearly
FROM (
    SELECT 
      organization,
      COUNT(DISTINCT phoneNumber) as weekly,
      null as monthly,
      null as quarterly,
      null as yearly
    FROM Lead
    WHERE createdAt >= current_date - interval '7 days'
    GROUP BY organization

      UNION ALL

    SELECT
      organization,
      null as weekly,
      COUNT(DISTINCT phoneNumber) as monthly,
      null as quarterly,
      null as yearly
    FROM Lead
    WHERE createdAt >= current_date - interval '1 month'
    GROUP BY organization

        UNION ALL

    SELECT
      organization,
      null as weekly,
      null as monthly,
      COUNT(DISTINCT phoneNumber) as quarterly,
      null as yearly
    FROM Lead
    WHERE createdAt >= current_date - interval '3 months'
    GROUP BY organization

        UNION ALL

    SELECT
      organization,
      null as weekly,
      null as monthly,
      null as quarterly,
      COUNT(DISTINCT phoneNumber) as yearly
    FROM Lead
    WHERE createdAt >= current_date - interval '1 year'
    GROUP BY organization
) GROUP BY organization

Any way to make the query faster / easier to understand?

Upvotes: 0

Views: 464

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269493

If I understand correctly, you would just use conditional aggregation:

SELECT organization,
       COUNT(DISTINCT CASE WHEN created_at >= current_date - interval '7 day' THEN phoneNumber END) as weekly,
       COUNT(DISTINCT CASE WHEN created_at >= current_date - interval '1 month' THEN phoneNumber END) as monthly,
       COUNT(DISTINCT CASE WHEN created_at >= current_date - interval '3 month' THEN phoneNumber END) as quarterly,
       COUNT(DISTINCT CASE WHEN created_at >= current_date - interval '1 year' THEN phoneNumber END) as yearly
FROM Lead
WHERE createdAt >= current_date - interval '1 year'
GROUP BY organization;

Upvotes: 1

Related Questions