Reputation: 1836
I have a table which has no primary key and I can't add one - the relevant columns from it are:
Department | Category |
-------------+-----------+
0001 | A |
0002 | D |
0003 | A |
0003 | A |
0003 | C |
0004 | B |
I want to retrieve a single row for each Department
, which gives me the department code and the Category
which appears most frequently in the table, i.e.
Department | Category |
-------------+-----------+
0001 | A |
0002 | D |
0003 | A |
0004 | B |
What is the best way to achieve this? My current attempt involves a Count(Category)
in a subquery from which the Max(CountofCategory)
is then taken, but including the Category
field at this stage means too many rows at returned (since GROUP BY
is applied at Category
level as well as Department
). In the case of a tie, I'd just select the min/max of the category arbitrarily. Ideally this should be database-agnostic, but is likely to run on either Oracle or MySQL.
Upvotes: 1
Views: 6691
Reputation: 21
You can try the following as well. Window here returns the Categories ordered by their descending frequency of match against each Department. FIRST_VALUE() picks the first one from this.
SELECT DISTINCT (department),
FIRST_VALUE(category) OVER
(PARTITION BY department ORDER BY count(*) DESC ROWS UNBOUNDED PRECEDING)
FROM T
GROUP BY department, category;
Upvotes: 1
Reputation: 1
There's an easier way:
select department, stats_mode(category) from T ;
works great when only the most frequent value is needed, when you need 2nd, 3rd... most frequent you have to do the counting like above.
Upvotes: 0
Reputation: 18410
Works in both Oracle and SQL Server, I believe is all standard SQL, from later standards:
with T_with_RN as
(select Department
, Category
, row_number() over (partition by Department order by count(*) Desc) as RN
from T
group by Department, Category)
select Department, Category
from T_with_RN
where RN = 1
EDIT I don't know why I used the WITH, the solution is probably easier to read using an inline view:
select Department, Category
from (select Department
, Category
, row_number() over (partition by Department order by count(*) Desc) as RN
from T
group by Department, Category) T_with_RN
where RN = 1
END EDIT
Test cases:
create table T (
Department varchar(10) null,
Category varchar(10) null
);
-- Original test case
insert into T values ('0001', 'A');
insert into T values ('0002', 'D');
insert into T values ('0003', 'A');
insert into T values ('0003', 'A');
insert into T values ('0003', 'C');
insert into T values ('0004', 'B');
-- Null Test cases:
insert into T values (null, 'A');
insert into T values (null, 'B');
insert into T values (null, 'B');
insert into T values ('0005', null);
insert into T values ('0005', null);
insert into T values ('0005', 'X');
-- Tie Test case
insert into T values ('0006', 'O');
insert into T values ('0006', 'P');
Upvotes: 3
Reputation: 33197
You'll have to clean this up if you are better with subqueries than I am, but in my testing this produced the result you want:
SELECT
main.Department as Department,
(SELECT
Category
FROM yourtable
WHERE Department=main.Department
GROUP BY Category
ORDER BY COUNT(Category) DESC
LIMIT 1) AS Category
FROM yourtable main
GROUP BY main.Department
The trick is just to get the one row in the subquery to return the max value you want with the ORDER BY and the "LIMIT 1"
Upvotes: 0