hamsy
hamsy

Reputation: 369

Rank in SQL ordered by date

I'm trying to create a rank that would be partitioned by 2 columns and ordered in the chronological order.

This is my original table, daily table with company category.

create table T1
(
    observation_date date,
    company_id varchar(16383),
    company_category varchar(16383)
);

insert into T1 values (company_id, calendar_date, company_category) 
                        ('aaaa', '2020-01-31', 'new');
                        ('aaaa', '2020-02-01', 'new');
                        ('aaaa', '2020-02-02', 'new');
                        ('aaaa', '2020-02-03', 'new')
                        ('aaaa', '2020-08-20', 'converted');
                        ('aaaa', '2020-08-21', 'big');
                        ('aaaa', '2020-08-22', 'big');
                        ('cccc', '2020-12-02', 'new');
                        ('cccc', '2020-12-03', 'converted');
                        ('cccc', '2020-12-04', 'big');
                        ('aaaa', '2020-09-18', 'converted');
                        ('aaaa', '2020-09-19', 'converted');

I want to create a rank based on company and its category in the chronological order. If at some point a company gets a category it had before, the rank will still increment as long there was another category in between ( see company aaaa on '2020-08-20' and '2018-06-18'.

create table T2
(
    observation_date date,
    company_id varchar(16383),
    company_category varchar(16383),
    ranked integer
);
insert into T2 values (company_id, calendar_date, company_category, ranked) 
                        ('aaaa', '2020-01-31', 'new', 1);
                        ('aaaa', '2020-02-01', 'new', 1);
                        ('aaaa', '2020-02-02', 'new', 1);
                        ('aaaa', '2020-02-03', 'new', 1);
                        ('aaaa', '2020-08-20', 'converted', 2);
                        ('aaaa', '2020-08-21', 'big', 3);
                        ('aaaa', '2020-08-22', 'big', 3);
                        ('cccc', '2020-12-02', 'new', 1);
                        ('cccc', '2020-12-03', 'converted', 2);
                        ('cccc', '2020-12-04', 'big', 3);
                        ('aaaa', '2020-09-18', 'converted', 4);
                        ('aaaa', '2020-09-19', 'converted', 4);

Can't wrap up my head around how to make it work. I have tried doing:

rank() over (partition by company_id,company_category order by calendar_date)

but then usage of calendar_date in the order clause creates unique ranks per date with restart on each partition.

dense_rank() over (partition by company_id order by company_category)

Is getting closer but ordering by company_category screws the chronological order.

Any tips are appreciated. Thank you in advance!

Upvotes: 1

Views: 270

Answers (1)

Salman Arshad
Salman Arshad

Reputation: 272106

RANK won't work. You need to track changes in category by comparing current value with previous value; then number the rows such that the number increments every time there is a change. Something like:

WITH cte AS (
    SELECT *
         , CASE WHEN company_category = LAG(company_category) OVER (PARTITION BY company_id ORDER BY observation_date) THEN 0 ELSE 1 END AS changed
    FROM t1
)
SELECT *
     , SUM(changed) OVER (PARTITION BY company_id ORDER BY observation_date) AS ranked
FROM cte
ORDER BY company_id, observation_date

Upvotes: 3

Related Questions