Reputation: 6020
I have a table with userIds
and product categories prod
. I want to get a table of unique userIds
and associated most occurring product categories prod
. In other words, I want to know what item categorys each customer is buying the most. How can I achieve this in PL/SQL or Oracle SQL?
|userId|prod|
|------|----|
|123544|cars|
|123544|cars|
|123544|dogs|
|123544|cats|
|987689|bats|
|987689|cats|
I have already seen SO questions for getting the most common value of a column, but how do I get the most common value for each unique userId
?
Upvotes: 1
Views: 591
Reputation: 12486
SELECT user_id, prod, prod_cnt FROM (
SELECT user_id, prod, prod_cnt
, RANK() OVER ( PARTITION BY user_id ORDER BY prod_cnt DESC ) AS rn
FROM (
SELECT user_id, prod, COUNT(*) AS prod_cnt
FROM mytable
GROUP BY user_id, prod
)
) WHERE rn = 1;
In the innermost subquery I am getting the COUNT
of each product by user. Then I rank them using the analytic (window) function RANK()
. Then I simply select all of those where the RANK
is equal to 1. Using RANK()
instead of ROW_NUMBER()
ensures that ties will be returned.
Upvotes: 2
Reputation: 3344
You should use just SQL to solve this .. if you really need it in pl/sql, just imbed this query within plsql ..
(setup)
drop table yourtable;
create table yourtable (
userID number,
prod varchar2(10)
)
/
insert into yourtable values ( 123544, 'cars' );
insert into yourtable values ( 123544, 'cars' );
insert into yourtable values ( 123544, 'dogs' );
insert into yourtable values ( 123544, 'cats' );
insert into yourtable values ( 987689, 'bats' );
insert into yourtable values ( 987689, 'cats' );
commit;
-- assuming ties are not broken, this logic returns both ties
with w_grp as (
select userID, prod, count(*) over ( partition by userID, prod ) rgrp
from yourtable
),
w_rnk as (
select userID, prod, rgrp,
rank() over (partition by userID order by rgrp desc) rnk,
from w_grp
)
select distinct userID, prod
from w_rnk
where rnk = 1
/
USERID PROD
---------- ----------
987689 bats
987689 cats
123544 cars
-- assuming you just want 1 .. this will return 1 random one if they are tied. (ie this time it pulled 987689 bats, next time it might pull 987689 cats. It will always return 123544 cars, however, since there is no tie for that one.
with w_grp as (
select userID, prod, count(*) over ( partition by userID, prod ) rgrp
from yourtable
),
w_rnk as (
select userID, prod, rgrp,
row_number() over (partition by userID order by rgrp desc) rnum
from w_grp
)
select userID, prod, rnum
from w_rnk
where rnum = 1
/
USERID PROD RNUM
---------- ---------- ----------
123544 cars 1
987689 bats 1
[edit] Cleaned up unused rank/row_number from functions to avoid confusion [/edit]
Upvotes: 3