SQL - Most frequent value in column of joined tables

Question

I have three tables described below:

Area (Id, Description)

City(Id, Name)

Problem(Id, City, Area, Definition):
 City references City (Id), Area references Area (Id)

I want to find the most frequent value of Area(Description) that appears in Problem for each City (Name).

Example:

Area
Id   Description
1      Support
2      Finance  

City
Id      Name
1      Chicago
2      Boston

Problem
Id  City  Area  Definition
1     1     2       A
2     1     2       B
3     1     1       C
4     2     1       D

Desired Output:

 Name         Description
 Chicago        Finance
 Boston         Support

Here's what I have tried with no success :

SELECT Name,
       Description
FROM
  (SELECT *
   FROM Problem AS P,
        City AS C,
        Area AS A
   WHERE C.Id = P.City
     AND A.Id = P.Area ) AS T1
WHERE Description =
    (SELECT Description
     FROM
       (SELECT *
        FROM Problem AS P,
             City AS C,
             Area AS A
        WHERE C.Id = P.City
          AND A.Id = P.Area ) AS T2
     WHERE T1.Name = T2.Name
     GROUP BY Description
     ORDER BY Count(Name) DESC LIMIT 1 )
GROUP BY Name,
         Description

Thanks!

Paul Spiegel · Accepted Answer

This is probably the shortest way to solve your issue:

select c.Name, a.Description
from City c
cross join Area a
where a.Id = (
    select p.Area
    from Problem p
    where p.City = c.Id
    group by p.Area
    order by count(*) desc, p.Area asc
    limit 1
)

We use a CROSS JOIN to combine every City with every Area. But we pick only the Area with the highest count in the Problem table for the given city, which is determined in the correlated subquery. If two areas have the same highest count for a city, the one coming first alphabetically will be picked (order by ... p.Area asc).

Result:

|    Name | Description |
|---------|-------------|
|  Boston |     Support |
| Chicago |     Finance |

Here's another more complex solution which includes the count.

select c.Name, a.Description, city_area_maxcount.mc as problem_count
from (
    select City, max(c) as mc
    from (
        select p.City, p.Area, count(*) as c
        from problem p
        group by p.City, p.Area
    ) city_area_count
    group by City
) city_area_maxcount
join (
    select p.City, p.Area, count(*) as c
    from problem p
    group by p.City, p.Area
) city_area_count
    on  city_area_count.City = city_area_maxcount.City
    and city_area_count.c = city_area_maxcount.mc
join City c on c.Id = city_area_count.City
join Area a on a.Id = city_area_count.Area

The subquery alisaed as city_area_maxcount is used twice here (i hope mysql can cache the result). If you think of it as a table, that would be a common find-the-row-with-top-value-per-group problem. If two areas have the same highest count for a city, both will be selected.

Result:

|    Name | Description | problem_count |
|---------|-------------|---------------|
|  Boston |     Support |             1 |
| Chicago |     Finance |             2 |

Demo: http://sqlfiddle.com/#!9/c66a5/2

SQL - Most frequent value in column of joined tables

Answers (2)

Related Questions