Reputation: 115
I have the following table named population
:
╔════════════╦════════════╦════════════════╗
║ india ║ hyderabad ║ 50100 ║
║ india ║ delhi ║ 75000 ║
║ USA ║ NewYork ║ 25000 ║
║ USA ║ california ║ 30000 ║
║ india ║ delhi ║ 5000 ║
║ USA ║ NewYork ║ 75000 ║
╚════════════╩════════════╩════════════════╝
I need to write a SQL query to get data in the following format:
╔════════╦═════════╦══════════╗
║ india ║ delhi ║ 80000 ║
║ USA ║ NewYork ║ 100000 ║
╚════════╩═════════╩══════════╝
country name and the city with the highest population where multiple entries of the cities are summed up.
Upvotes: 1
Views: 290
Reputation: 49049
You could use a combination of GROUP_CONCAT and FIND_IN_SET. This query will return a comma separated list of cities for every country, ordered by population DESC:
SELECT country, GROUP_CONCAT(city ORDER BY pop DESC) AS cities
FROM population
GROUP BY country
and it will return something like this:
| country | cities |
|---------|--------------------------|
| india | delhi,hyderabad,delhi |
| USA | NewYok,california,NewYok |
then we can join this subquery back to the population table using FIND_IN_SET that returns the position of a city in the list of cities:
SELECT
p.country,
p.city,
SUM(p.pop)
FROM
population p INNER JOIN (
SELECT country, GROUP_CONCAT(city ORDER BY pop DESC) AS cities
FROM population
GROUP BY country
) m ON p.country=m.country
AND FIND_IN_SET(p.city, m.cities)=1
GROUP BY
p.country,
p.city
the join will succeed only on the city with the maximum population for every country: FIND_IN_SET(p.city, m.cities)=1
.
This will work only if there's one city with the maximum poluation, if there are more only one will be returned. This also is not standard SQL and will only work on MySQL or similar, other DBMS have window functions that will make this same query easier to write.
Upvotes: 1
Reputation: 727
The following answer is not correct as it uses a feature specific to Mysql which violates the ANSI standards. The result is not deterministic as it is not defined which city name will be returned when aggregating by country. Mostly it is the first entry which will be used, this is why sorting in the inner query makes this work in most cases. But beware: It is by definition not guaranteed to use the first city, hence there can be cases where this will output wrong results. Another case this answer does not cover, is when there are two cities with same population as max for a country. This solution will only output one city per country.
I would solve it with a inner subquery which gets all cities grouped and the outer filters only to get the largest by country.
SELECT
country, city, MAX(population_total) AS population_total
FROM
(
SELECT country, city, SUM(population) AS population_total
FROM tableName
GROUP BY country, city
ORDER BY population_total DESC
) AS t1
GROUP BY
country
Upvotes: -1
Reputation: 175686
You can use:
SELECT *
FROM (
SELECT country,city, SUM(pop) AS total
FROM population
GROUP BY country,city) AS sub
WHERE (country, total) IN (
SELECT country, MAX(total)
FROM (SELECT country,city, SUM(pop) AS total
FROM population
GROUP BY country,city
) as s
GROUP BY country
);
If two cities in the same country have the same highest total population you will get two cities for that country.
Output:
╔══════════╦═════════╦════════╗
║ country ║ city ║ total ║
╠══════════╬═════════╬════════╣
║ india ║ delhi ║ 80000 ║
║ USA ║ NewYork ║ 100000 ║
╚══════════╩═════════╩════════╝
Upvotes: 3