FullStack
FullStack

Reputation: 6020

Get most commonly occurring value for each user id

I have a table with userIds and product categories prod. I want to get a table of unique userIds and associated most occurring product categories prod. In other words, I want to know what item categorys each customer is buying the most. How can I achieve this in PL/SQL or Oracle SQL?

|userId|prod|
|------|----|
|123544|cars|
|123544|cars|
|123544|dogs|
|123544|cats|
|987689|bats|
|987689|cats|

I have already seen SO questions for getting the most common value of a column, but how do I get the most common value for each unique userId?

Upvotes: 1

Views: 591

Answers (2)

David Faber
David Faber

Reputation: 12486

SELECT user_id, prod, prod_cnt FROM (
    SELECT user_id, prod, prod_cnt
         , RANK() OVER ( PARTITION BY user_id ORDER BY prod_cnt DESC ) AS rn
      FROM (
        SELECT user_id, prod, COUNT(*) AS prod_cnt
          FROM mytable
         GROUP BY user_id, prod
    )
) WHERE rn = 1;

In the innermost subquery I am getting the COUNT of each product by user. Then I rank them using the analytic (window) function RANK(). Then I simply select all of those where the RANK is equal to 1. Using RANK() instead of ROW_NUMBER() ensures that ties will be returned.

Upvotes: 2

Ditto
Ditto

Reputation: 3344

You should use just SQL to solve this .. if you really need it in pl/sql, just imbed this query within plsql ..

(setup)

  drop table yourtable;
  create table yourtable (
     userID   number,
     prod     varchar2(10)
     )
  /

  insert into yourtable values ( 123544, 'cars' );
  insert into yourtable values ( 123544, 'cars' );
  insert into yourtable values ( 123544, 'dogs' );
  insert into yourtable values ( 123544, 'cats' );
  insert into yourtable values ( 987689, 'bats' );
  insert into yourtable values ( 987689, 'cats' );

  commit;

-- assuming ties are not broken, this logic returns both ties

  with w_grp as (
        select userID, prod, count(*) over ( partition by userID, prod ) rgrp
          from yourtable
        ),
     w_rnk as (
        select userID, prod, rgrp,
               rank() over (partition by userID order by rgrp desc) rnk,
          from w_grp
        )
  select distinct userID, prod
    from w_rnk
   where rnk = 1
  /

      USERID PROD
  ---------- ----------
      987689 bats
      987689 cats
      123544 cars

-- assuming you just want 1 .. this will return 1 random one if they are tied. (ie this time it pulled 987689 bats, next time it might pull 987689 cats. It will always return 123544 cars, however, since there is no tie for that one.

  with w_grp as (
        select userID, prod, count(*) over ( partition by userID, prod ) rgrp
          from yourtable
        ),
     w_rnk as (
        select userID, prod, rgrp,
               row_number() over (partition by userID order by rgrp desc) rnum
          from w_grp
        )
  select userID, prod, rnum
    from w_rnk
   where rnum = 1
  /

      USERID PROD             RNUM
  ---------- ---------- ----------
      123544 cars                1
      987689 bats                1

[edit] Cleaned up unused rank/row_number from functions to avoid confusion [/edit]

Upvotes: 3

Related Questions