Reputation: 1081
I have a table that looks like this:
category name1 name2 name3 name4 name5
1 John Sam John Katy Cat
1 John Ivan Bob Andrew Tom
1 Sam Ivan George Bob Tom
2 Jack Siri Elsa Noah Anna
2 Jack Bob Tomas Noah Tom
What I need to do, is to find the most common value in each column for each category. That is, I need the following result:
category name1 name2 name3 name4 name5
1 John Ivan John Katy Tom
2 Jack Siri Elsa Noah Anna
If there are several values with the same frequency, any of them may be chosen.
So far, I have only managed to do that for one column with this script:
SELECT top(1) category, name1, COUNT(name1) AS freq
FROM data
GROUP BY category, name1
ORDER BY freq DESC
But how do I do that for several columns in SQL Server?
Upvotes: 0
Views: 147
Reputation: 1269873
If you don't mind the results in rows, you can unpivot, which makes this much simpler:
select category, which, name
from (select t.category, v.which, v.name, count(*) as cnt,
row_number() over (partition by t.category, v.which order by count(*) desc) as seqnum
from t cross apply
(values (1, name1), (2, name2), (3, name3), (4, name4), (5, name4)
) v(which, name)
from t
group by t.category, v.which, v.name
) cwn
where seqnum = 1;
You can re-pivot if you want this in columns:
with cwn as (
select t.category, v.which, v.name, count(*) as cnt,
row_number() over (partition by t.category, v.which order by count(*) desc) as seqnum
from t cross apply
(values (1, name1), (2, name2), (3, name3), (4, name4), (5, name4)
) v(which, name)
from t
group by t.category, v.which, v.name
)
select category,
max(case when which = 1 then name end) as name1,
max(case when which = 2 then name end) as name2,
max(case when which = 3 then name end) as name3,
max(case when which = 4 then name end) as name4,
max(case when which = 5 then name end) as name5
from cwn
where seqnum = 1
group by category
Upvotes: 1
Reputation: 86735
One option, with lots of repetition, but accommodating your current structure...
(Though it still has the same non-deterministic/arbitrary behaviour when dealing with equally frequent names)
WITH
counted AS
(
SELECT
category,
name1,
COUNT(*) OVER (PARTITION BY category, name1) AS name1_freq,
name2,
COUNT(*) OVER (PARTITION BY category, name2) AS name2_freq
FROM
yourTable
),
ranked AS
(
SELECT
category,
name1,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY name1_freq DESC) AS name1_rank,
name2,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY name2_freq DESC) AS name2_rank
FROM
counted
)
SELECT
category,
MAX(CASE WHEN name1_rank = 1 THEN name1 END) AS name1_most_common,
MAX(CASE WHEN name2_rank = 1 THEN name2 END) AS name2_most_common
FROM
ranked
GROUP BY
category
As you can see, that's a LOT of repetition. Which is why SQL is predicated on normalised data-structures. So much so that normalising then denormalising your structure may be a valid option...
WITH
normalised(category, col, name) AS
(
SELECT category, 1, name1 FROM yourTable
UNION ALL SELECT category, 2, name2 FROM yourTable
),
counted AS
(
SELECT
category, col, name, COUNT(*) AS freq
FROM
normalised
GROUP BY
category, col, name
),
ranked AS
(
SELECT
category, col, name,
ROW_NUMBER() OVER (PARTITION BY category, col ORDER BY freq DESC) AS rank
FROM
counted
)
SELECT
category,
MAX(CASE WHEN col = 1 THEN name END) AS name1_most_common,
MAX(CASE WHEN col = 2 THEN name END) AS name2_most_common
FROM
ranked
WHERE
rank = 1
GROUP BY
category
Upvotes: 1
Reputation: 164099
First create a CTE that uses COUNT()
window function to return the number of times each name occurs in each category and then use FIRST_VALUE()
window function to get for each column the name that occurs the most:
WITH cte AS (
SELECT *,
COUNT(*) OVER (PARTITION BY category, name1) count1,
COUNT(*) OVER (PARTITION BY category, name2) count2,
COUNT(*) OVER (PARTITION BY category, name3) count3,
COUNT(*) OVER (PARTITION BY category, name4) count4,
COUNT(*) OVER (PARTITION BY category, name5) count5
FROM tablename
)
SELECT DISTINCT category,
FIRST_VALUE(name1) OVER (PARTITION BY category ORDER BY count1 DESC) name1,
FIRST_VALUE(name2) OVER (PARTITION BY category ORDER BY count2 DESC) name2,
FIRST_VALUE(name3) OVER (PARTITION BY category ORDER BY count3 DESC) name3,
FIRST_VALUE(name4) OVER (PARTITION BY category ORDER BY count4 DESC) name4,
FIRST_VALUE(name5) OVER (PARTITION BY category ORDER BY count5 DESC) name5
FROM cte
See the demo.
Upvotes: 4