SQL query to find most prevalent associated value in table

Question

I have an simple SQL table that associates two values which looks like this:

table1(column1 varchar (32), column2 varchar(32));

For each distinct value in column1, out of the list of value it has, I want to find the one that occurs the most in this table.

An example to clarify:

let's say I have the following values:

a1, b1
a2, b2
a3, b3
a4, b1
a3, b1
a3, b2
a5, b1
a6, b2

the result I hope for is:

a1, b1
a2, b2
a3, b1
a4, b1
a5, b1
a6, b2

because b1 and b2 have the most occurrences in the table.

Gordon Linoff · Accepted Answer

This is a good application for window functions. There is more than one way to approach it. Here is one method. Get the frequency of column2 on each row. Then, rank all these frequencies using row_number():

select column1, column2
from (select t.*,
             row_number() over (partition by column1 order by col2cnt desc) as seqnum
      from (select t.*, count(*) over (partition by column2) as col2cnt
            from t 
           ) t
     ) t
where seqnum = 1

The final step (done by the outermost query) is to choose one with the highest rank (that is with the highest count).

In the event of ties (that is if b2 appeared as often as b1), then this version would choose an arbitrary value.

SQL query to find most prevalent associated value in table

Answers (1)

Related Questions