Reputation: 23

SQL function to return "most common value" for multiple columns in a Group By

I'm looking to find the simplest way to return the most common value in multiple column results of a select statement that is grouped. Everything I am finding online points to RANK on a single item in the select or dealing with each column separately outside of a GROUP BY.

Sample Data:

SELECT 100 as "auser", 
'A' as "instance1", 'M' as "instance2" 
union all select 100, 'B', 'M' 
union all select 100,'C', 'N' 
union all select 100, 'B', 'O'
union all select 200,'D', 'P' 
union all select 200, 'E', 'P' 
union all select 200,'F', 'P' 
union all select 200, 'F', 'Q'

Sample Data result:

auser   instance1   instance2
100     A           M
100     B           M
100     C           N
100     B           O
200     D           P
200     E           P
200     F           P
200     F           Q

Query logic (how I see it in my head):

SELECT auser, most_common(instance1), most_common(instance2)
FROM datasample
GROUP BY auser;

Desired result:

100     B           M
200     F           P

Upvotes: 2

Answers (3)

Arunav dutta gupta

Reputation: 109

Just Go Through Simple

Select auser, instance1, instance2 FROM datasample GROUP BY auser,instance1, instance2 ;

Upvotes: 0

Tim Lehner

Reputation: 15261

I'm not sure if I can find something much more elegant, but this may do if you're on SQL 2005+ (since I'm using a ranking function and CTEs):

with instance1 as (
    select auser, instance1
        , row_number() over (partition by auser order by count(*) desc, instance1) as row_num
    from datasample
    group by auser, instance1
), instance2 as (
    select auser, instance2
        , row_number() over (partition by auser order by count(*) desc, instance2) as row_num
    from datasample
    group by auser, instance2
)
select a.auser, a.instance1, b.instance2
from instance1 as a 
    join instance2 as b on a.auser = b.auser
where a.row_num = 1
    and b.row_num = 1
order by a.auser;

I'm not sure how you wish to handle nulls, and moving the row_num equivalency to the join condition doesn't change the execution plan on my box.

If you're on SQL Server 2000, then you can replace these CTEs with derived tables and fake the row_number() through the use of count and a "triangular join".

Upvotes: 1

Gordon Linoff

Reputation: 1271151

This approach to solving this problem uses nested window functions. The innermost subquery calculates the count for each column. The next subquery ranks these (using row_number() ). The outer query then uses conditional aggregation to get the results that you want:

select auser, MAX(case when seqnum1 = 1 then instance1 end),
       MAX(case when seqnum2 = 1 then instance2 end)
from (select t.*,
             ROW_NUMBER() over (partition by auser order by cnt1 desc) as seqnum1,
             ROW_NUMBER() over (partition by auser order by cnt2 desc) as seqnum2
      from (select t.*,
                   count(*) over (partition by auser, instance1) as cnt1,
                   COUNT(*) over (partition by auser, instance2) as cnt2
            from t
           ) t
     ) t
group by auser

Upvotes: 4

SQL function to return &quot;most common value&quot; for multiple columns in a Group By

Answers (3)

Related Questions

SQL function to return "most common value" for multiple columns in a Group By