dimButTries
dimButTries

Reputation: 886

Athena row_number function using in combination with where or count

Bit of a newbie here and struggling to get to grips with what should be rudimentary tasks.

I have a simple table with products, many of which have multiple colours. I want to know how many products have more than three colours.

I started with something like the below:

SELECT product_colour, 
       product_id,
       row_number() over (partition by product_colour) AS row_num
FROM table
WHERE row_num > 3
ORDER BY product_id

The above did not work, because of an outer/inner logic in SQL, Googled it earlier, so misinterpretation is likely.

I attempted this, having followed a tutorial.

SELECT rn,
       product_colour,
       product_id,
FROM 
(
  SELECT row_number() over (partition by product_colour ORDER BY product_colour) AS rn,
         product_colour,
         product_id,
  FROM table
) AS t
HAVING COUNT(t.rn) > 3

No matter what combinations of syntax "stuff" I try in the vein hope of getting past this error message "'"rn"' must be an aggregate expression or appear in GROUP BY clause". I simply cannot do it.

Having been inspired by the first responder, I have managed to get the closest yet, however, this query no longer returns any error messages, just an empty table.

SELECT product_colour,
       t.rn,
       product_id,
FROM 
(
  SELECT row_number() over (partition by product_colour) AS rn,
         product_colour,
         product_id
  FROM table
) AS t
GROUP BY product_colour, product_id, t.rn
HAVING COUNT(t.rn) > 3
ORDER BY product_colour, t.rn

Any ideas?

Upvotes: 0

Views: 8067

Answers (1)

GMB
GMB

Reputation: 222622

If you want the list of products that have 3 colours or more, no need for window functions, you can just use aggregation:

select product_id
from mytable
group by product_id
having count(distinct product_colour) >= 3

If you want to know how many products have more 3 colours or more, then you can add another level of aggregation:

select count(*)
from (
    select 1
    from mytable
    group by product_id
    having count(distinct product_colour) >= 3
) t

Note: if (product_id, product_colour) types are unique in your table, there is no need for distinct.

Upvotes: 2

Related Questions