arc000
arc000

Reputation: 873

SQL query to get most prevalent value in a column

I have two tables as below -

Sale Record:

    Date    |   Customer   |    ItemSold 
-----------------------------------------
11/01/2013  |     Alex     |     Pen
12/01/2013  |     Rony     |     Paper
13/01/2013  |     Alex     |     Eraser
14/01/2013  |     Marty    |     Eraser
15/01/2013  |     Alex     |     Pen
16/01/2013  |     Rob      |     Paper
17/01/2013  |     Alex     |     Pencil
18/01/2013  |     Alex     |     Pen
19/01/2013  |     Ned      |     Pen
20/01/2013  |     Alex     |     Paper
21/01/2013  |     Alex     |     Pencil
22/01/2013  |     Ned      |     Pen
23/01/2013  |     Alex     |     Eraser
24/01/2013  |     Alex     |     Pen
25/01/2013  |     Alex     |     Pen
26/01/2013  |     Alex     |     Paper
27/01/2013  |     Ned      |     Paper
28/01/2013  |     Alex     |     Pen
29/01/2013  |     Alex     |     Eraser
30/01/2013  |     Alex     |     Pen
31/01/2013  |     Rony     |     Pencil
01/02/2013  |     Alex     |     Eraser
02/02/2013  |     Ned      |     Paper
03/02/2013  |     Alex     |     Pen

Priority:

ItemName    |    Priority
--------------------------
Pen         |       1
Paper       |       2
Pencil      |       3
Eraser      |       4

I want to get a list to know which customer is likely to buy what as below -

Name   |   Item
----------------
Alex   |   Pen
Rob    |   Paper
Ned    |   Pen
Marty  |   Eraser
Rony   |   Paper

If there is a tie with items, the item with highest priority should be selected. Ned bought Pen and Paper each twice but Pen should be selected as it has more priority than paper.

What will be the sql query for this ?

Upvotes: 2

Views: 140

Answers (3)

roman
roman

Reputation: 117345

I think this would be the fastest way to do this. Note that I've used priority in the order by , but didn't use it in the group by - PostgreSQL allows that if you have a functional dependency from itemname to priority in Priority table:

select distinct on (s.customer)
    s.customer, p.itemname, count(*) as total
from sales as s
    inner join priority as p on p.itemname = s.itemsold
group by s.customer, p.itemname
order by s.customer, total desc, p.priority

If it's not possible, you can use this query:

select distinct on (s.customer)
    s.customer, s.itemsold, count(*) as total
from sales as s
    inner join priority as p on p.itemname = s.itemsold
group by s.customer, s.itemsold, p.priority
order by s.customer, total desc, p.priority;

sql fiddle demo

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269543

Statistically, the term for what you are looking for is the mode. Here is one way to calculate it using window/analytic functions:

select customer, ItemSold
from (select customer, ItemSold, count(*),
             row_number() over (partition by customer order by count(*) desc, p.priority
                               ) as seqnum
      from sales s left outer join
           priority p
           on s.ItemSold = p.ItemName
      group by customer, ItemSold
     ) ci
where seqnum = 1;

Upvotes: 1

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125214

SQL Fiddle

select distinct on (customer)
    customer, itemsold, total
from
    (
        select customer, itemsold, count(*) total
        from sales
        group by customer, itemsold
    ) s
    inner join priority on itemsold = itemname
order by customer, total desc, priority

Upvotes: 1

Related Questions