Reputation: 607

How to get distinct count for last x weeks data but group by week in redshift?

I have a below query which I run gives me the single count for month as current month and dates_for_week has list of all the dates for last week from Sunday to Saturday.

select COUNT(DISTINCT(CLIENTID))
FROM process.data
where type = 'pots'
and stype= 'kites'
and tires IN ('abc', 'def', 'ghi', 'jkl')
and comp IN ('data', 'hello', 'world')
AND year = '2020'
-- this is for month october but week 43
and (month = '10' and dates_for_week IN  ('18', '19', '20', '21', '22', '23', '24'))

As of now the output I see is this -

Count
-----
982

Now I am trying to make this query dynamic such that it can give me count for past 6 weeks something like below as an output:

Count   Week
------------
982     W43
123     W42
126     W41
127     W40
128     W39
129     W38

I am able to convert above query in a dynamic way which gives me the count for current month october and previous week which is 43 and it works fine as shown below but I am not sure how can I change it so that it can give me data for all past 6 weeks in the above output format. It looks like I need to change the month also dynamically for some week to get output for past 6 weeks.

select COUNT(DISTINCT(CLIENTID))
FROM process.data
where type = 'pots'
and stype= 'kites'
and tires IN ('abc', 'def', 'ghi', 'jkl')
and comp IN ('data', 'hello', 'world')
AND year = '2020'
-- this is for month october but week 43
and (
    month = extract(month from current_date)
    and dates_for_week IN (
            select
                date_part('d',((DATE_TRUNC('week', CURRENT_DATE) - 9) + row_number() over (order by true))::date)
            from process.data
            limit 7
  )
)

So what I need is this for last 6 week and group by week to give me the count as shown above. Is this possible to do by any chance?

and (month = '10' and dates_for_week IN  ('18', '19', '20', '21', '22', '23', '24'))
and (month = '10' and dates_for_week IN  ('11', '12', '13', '14', '15', '16', '17'))
and (month = '10' and dates_for_week IN  ('4', '5', '6', '7', '8', '9', '10'))
and (month IN ('9', '10') and dates_for_week IN  ('27', '28', '29', '30', '1', '2', '3'))
and (month = '9' and dates_for_week IN  ('20', '21', '22', '23', '24', '25', '26'))
and (month = '9' and dates_for_week IN  ('13', '14', '15', '16', '17', '18', '19'))

Upvotes: 0

Answers (3)

Philipp Johannis

Reputation: 2956

You have years, months and days in seperate columns if I understand it correctly. I think the easiest way is to "build" a proper date column and then working with this column.

The following query should give you the last 6 weeks including the current week.

select 
EXTRACT(week from TO_DATE(year||'-'||month||'-'|| dates_for_week,'YYYY-MM-DD')) week_num
,COUNT(DISTINCT(CLIENTID))
FROM process.data
where type = 'pots'
and stype= 'kites'
and tires IN ('abc', 'def', 'ghi', 'jkl')
and comp IN ('data', 'hello', 'world')
and TO_DATE(year||'-'||month||'-'|| dates_for_week,'YYYY-MM-DD') >= DATEADD(day,-42,DATE_TRUNC('week', sysdate)) 
GROUP BY 1
ORDER BY 1 desc

However, there might be a challenge as weeks start on a Monday in redshift, so might need to do a slight manipulation (adding one day):

select 
EXTRACT(week from DATEADD(day,1,TO_DATE(year||'-'||month||'-'|| dates_for_week,'YYYY-MM-DD'))) week_num
,COUNT(DISTINCT(CLIENTID))
FROM process.data
where type = 'pots'
and stype= 'kites'
and tires IN ('abc', 'def', 'ghi', 'jkl')
and comp IN ('data', 'hello', 'world')
and DATEADD(day,1,TO_DATE(year||'-'||month||'-'|| dates_for_week,'YYYY-MM-DD')) BETWEEN DATEADD(day,-42,DATE_TRUNC('week', sysdate)) AND DATEADD(day,-1,DATE_TRUNC('week', sysdate)) 
GROUP BY 1
ORDER BY 1 desc

Debugging:

I would start running this query first, to check if the date is calculated properly

select COUNT(DISTINCT(CLIENTID))
FROM process.data
where type = 'pots'
and stype= 'kites'
and tires IN ('abc', 'def', 'ghi', 'jkl')
and comp IN ('data', 'hello', 'world')
AND year = '2020'
-- this is for month october but week 43
and TO_DATE(year||'-'||month||'-'|| dates_for_week,'YYYY-MM-DD') between '2020-10-18' and '2020-10-24'

Afterwards I would see if the week is calculated correctly:

select 
EXTRACT(week from DATEADD(day,1,TO_DATE(year||'-'||month||'-'|| dates_for_week,'YYYY-MM-DD'))) week_num
,COUNT(DISTINCT(CLIENTID))
FROM process.data
where type = 'pots'
and stype= 'kites'
and tires IN ('abc', 'def', 'ghi', 'jkl')
and comp IN ('data', 'hello', 'world')
AND year = '2020'
-- this is for month october but week 43
and TO_DATE(year||'-'||month||'-'|| dates_for_week,'YYYY-MM-DD') between '2020-10-18' and '2020-10-24'
group by 1
order by 1

And last but not least I would extend the timeframe and make it dynamic:

select 
EXTRACT(week from DATEADD(day,1,TO_DATE(year||'-'||month||'-'|| dates_for_week,'YYYY-MM-DD'))) week_num
,COUNT(DISTINCT(CLIENTID))
FROM process.data
where type = 'pots'
and stype= 'kites'
and tires IN ('abc', 'def', 'ghi', 'jkl')
and comp IN ('data', 'hello', 'world')
AND year = '2020'
-- this is for month october but week 43
and DATEADD(day,1,TO_DATE(year||'-'||month||'-'|| dates_for_week,'YYYY-MM-DD')) Between DATEADD(day,-42,DATE_TRUNC('week', sysdate)) and DATEADD(day,-1,DATE_TRUNC('week', sysdate)) 
group by 1
order by 1

Upvotes: 2

Gordon Linoff

Reputation: 1271003

You could use order by and limit:

select year, week, COUNT(DISTINCT CLIENTID)
from process.data
where type = 'pots' and
      stype= 'kites' and
      tires IN ('abc', 'def', 'ghi', 'jkl') and
      comp IN ('data', 'hello', 'world')
group by year, dates_for_week
order by year desc, week desc
limit 6;

This is assuming that you have a week column, which seems like a reasonable assumption.

This is a simple way to accomplish what you want to do. I am guessing that on Redshift it should have decent performance.

Upvotes: 0

Jon Scott

Reputation: 4354

Assuming that you have some kind of date column, you can simply use something like this

select date_part(w, {your_date_column) as week_number,
COUNT(DISTINCT(CLIENTID))
FROM process.data
where type = 'pots'
and stype= 'kites'
and tires IN ('abc', 'def', 'ghi', 'jkl')
and comp IN ('data', 'hello', 'world')
AND year = '2020'
group by 1

Upvotes: 0

How to get distinct count for last x weeks data but group by week in redshift?

Answers (3)

Debugging:

Related Questions