Reputation: 73
I have three tables described below:
Area (Id, Description)
City(Id, Name)
Problem(Id, City, Area, Definition):
City references City (Id), Area references Area (Id)
I want to find the most frequent value of Area(Description) that appears in Problem for each City (Name).
Example:
Area
Id Description
1 Support
2 Finance
City
Id Name
1 Chicago
2 Boston
Problem
Id City Area Definition
1 1 2 A
2 1 2 B
3 1 1 C
4 2 1 D
Desired Output:
Name Description
Chicago Finance
Boston Support
Here's what I have tried with no success :
SELECT Name,
Description
FROM
(SELECT *
FROM Problem AS P,
City AS C,
Area AS A
WHERE C.Id = P.City
AND A.Id = P.Area ) AS T1
WHERE Description =
(SELECT Description
FROM
(SELECT *
FROM Problem AS P,
City AS C,
Area AS A
WHERE C.Id = P.City
AND A.Id = P.Area ) AS T2
WHERE T1.Name = T2.Name
GROUP BY Description
ORDER BY Count(Name) DESC LIMIT 1 )
GROUP BY Name,
Description
Thanks!
Upvotes: 0
Views: 2577
Reputation: 133390
The Max For each city, and area should be
select C.Name, A.Description from (
select t1.City, t1.Area, max(freq) as max_freq
from (
select P.City, P.Area, count(*) as Freq
from Problem as P
group by P.City, P.Area
) t1
) t2
INNER JOIN City AS C ON t2.City = C.Id
INNER JOIN Area AS A ON A.Id = t2.Area
Upvotes: 1
Reputation: 31812
This is probably the shortest way to solve your issue:
select c.Name, a.Description
from City c
cross join Area a
where a.Id = (
select p.Area
from Problem p
where p.City = c.Id
group by p.Area
order by count(*) desc, p.Area asc
limit 1
)
We use a CROSS JOIN to combine every City
with every Area
. But we pick only the Area
with the highest count in the Problem
table for the given city, which is determined in the correlated subquery. If two areas have the same highest count for a city, the one coming first alphabetically will be picked (order by ... p.Area asc
).
Result:
| Name | Description |
|---------|-------------|
| Boston | Support |
| Chicago | Finance |
Here's another more complex solution which includes the count.
select c.Name, a.Description, city_area_maxcount.mc as problem_count
from (
select City, max(c) as mc
from (
select p.City, p.Area, count(*) as c
from problem p
group by p.City, p.Area
) city_area_count
group by City
) city_area_maxcount
join (
select p.City, p.Area, count(*) as c
from problem p
group by p.City, p.Area
) city_area_count
on city_area_count.City = city_area_maxcount.City
and city_area_count.c = city_area_maxcount.mc
join City c on c.Id = city_area_count.City
join Area a on a.Id = city_area_count.Area
The subquery alisaed as city_area_maxcount
is used twice here (i hope mysql can cache the result). If you think of it as a table, that would be a common find-the-row-with-top-value-per-group problem. If two areas have the same highest count for a city, both will be selected.
Result:
| Name | Description | problem_count |
|---------|-------------|---------------|
| Boston | Support | 1 |
| Chicago | Finance | 2 |
Demo: http://sqlfiddle.com/#!9/c66a5/2
Upvotes: 2