U W
U W

Reputation: 1290

Redshift: Getting rank of a row, filtered by a condition

Every time I add a row to a table, I want to know where it ranks in comparison with the table up to that point. This is easily done with the RANK() window function. However, I'm struggling to find a way to to discover where it ranks in comparison with the table up until that point filtered by a value.

As an example, I'm wanting to end up with this highly contrived table:

    date    |   name  | animal_bought | num_sloths_bought_before | num_camels_bought_before
------------+---------+---------------+--------------------------+--------------------------
 2014-09-01 | Vincent | sloth         | 0                        | 0
 2014-09-01 | Luis    | camel         | 0                        | 0
 2014-09-02 | Vincent | sloth         | 1                        | 0
 2014-09-02 | Luis    | camel         | 0                        | 1
 2014-09-02 | Kevin   | sloth         | 0                        | 0
 2014-09-03 | Vincent | camel         | 1                        | 0
 2014-09-04 | Deo     | camel         | 0                        | 0
 2014-09-04 | Vincent | sloth         | 2                        | 1
 2014-09-05 | Luis    | camel         | 0                        | 2
 2014-09-05 | Andrew  | sloth         | 0                        | 0

I was initially looking to see whether I could apply a filter to the window function (eg. RANK() OVER(PARTITION BY name WHERE animal_bought = 'sloth' ORDER BY date ASC) AS num_sloths_bought_before) but this isn't syntactically correct. I then tried adding a sub-query, as follows:

SELECT
  date,
  name,
  animal_bought,
  ( SELECT
      RANK() OVER(PARTITION BY name ORDER BY date ASC) - 1
    FROM this_table
    WHERE animal_bought = 'sloth'
  ) AS num_sloths_bought_before
FROM source_table

but Redshift threw this error:

ERROR:  This type of correlated subquery pattern is not supported yet

I've also tried putting the window function in a case statement (throws the same error) and calculating the ranks in a join query (not been able to make it work).

Upvotes: 3

Views: 15449

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269653

Hmmm. I don't think this query would do what you want anyway:

SELECT date, name, animal_bought,
       (SELECT RANK() OVER(PARTITION BY name ORDER BY date ASC) - 1
        FROM this_table
        WHERE animal_bought = 'sloth'
       ) AS num_sloths_bought_before
FROM source_table

For a few reasons:

  • The use of rank() suggests that there is more than one row in this_table that matches animal_bought. Otherwise, you could use an aggregation function.
  • If there is only one row that matches the where clause, then the value is always 1, because the where clause is processed before the rank().
  • Your question only mentions one table but your query has two

Perhaps you just want rank() without a subquery?

SELECT date, name, animal_bought,
       RANK() OVER (PARTITION BY name, animal ORDER BY date ASC) - 1 as NumberBoughtBefore
FROM source_table;

If you want it for both animals, then don't use rank(), use cumulative sum:

SELECT date, name, animal_bought,
       sum(case when animal = 'sloth' then 1 else 0 end) over (partition by name order by date) as SlothsBefore,
       sum(case when animal = 'camel' then 1 else 0 end) over (partition by name order by date) as CamelsBefore
FROM source_table;

EDIT:

SELECT date, name, animal_bought,
       (sum(case when animal = 'sloth' then 1 else 0 end) over (partition by name order by date) -
        (case when animal = 'sloth' then 1 else 0 end)
       ) as SlothsBefore,
       (sum(case when animal = 'camel' then 1 else 0 end) over (partition by name order by date) -
        (case when animal = 'camel' then 1 else 0 end)
       ) as CamelsBefore
FROM source_table;

Upvotes: 3

Related Questions