Li Haoyi
Li Haoyi

Reputation: 15802

Speeding up a slow SQL query

I am using the MySQL world.sql database. Exactly what is in it doesn't matter, but the schema that matters to use looks like:

CREATE TABLE city (
  name char(35),
  country_code char(3),
  population int(11),
);
CREATE TABLE country (
  code char(3),
  name char(52),
  population int(11)
);

The query in question is, in english, "for each country, give me its name and population, along with the name and population for the city who has the highest ratio of its population to the country's population"

Currently I have the following SQL:

SELECT t.name, t.population, c.name, c.population
FROM country c
JOIN city t
ON t.country_code = c.code
WHERE t.population / c.population = (
    SELECT MAX(tt.population / c.population)
    FROM city tt
    WHERE t.country_code = tt.country_code
)

Currently the query takes about 10 minutes to run on my SQLite database. The world.sql database isn't large (4000-5000 rows?) so I'm guessing I'm doing something wrong here.

I currently don't have any sort of indexes or anything: the database is an empty database with this dataset (https://dl.dropboxusercontent.com/u/7997532/world.sql) entered into it. Could anyone give me any pointers as to what I need to fix to make it run in a reasonable amount of time?

EDIT: well here's another twist to the question:

This runs in <2 seconds

    SELECT t.name, t.population, c.name, c.population
    FROM country c
    JOIN city t
    ON t.country_code = c.code
    WHERE t.population * 1.0 / c.population = (
        SELECT MAX(tt.population * 1.0 / c.population)
        FROM city tt
        WHERE tt.country_code = t.country_code
    )

While this take 10 minutes to run

    SELECT t.name, t.population, c.name, c.population
    FROM country c
    JOIN city t
    ON t.country_code = c.code
    AND t.population * 1.0 / c.population = (
        SELECT MAX(tt.population * 1.0 / c.population)
        FROM city tt
        WHERE tt.country_code = t.country_code
    )

Is the solution then to simply stuff as much as possible into the ON clause when i'm doing JOINs? It seems in this case I can get away without an index if I do that...

Upvotes: 1

Views: 108

Answers (3)

Charles Bretana
Charles Bretana

Reputation: 146429

For each country, the city that has the highest ratio of population to it's country's population is the city with the highest population, so try this:

SELECT t.name, t.population, c.name, c.population
FROM country c
   JOIN city t
      ON t.country_code = c.code
         And population = 
              (Select Max(population) from city 
               Where country_code = c.Code)

But this may still not improve performance much... if you have no indicies. You need to put an index on country.code, and on city.country_code

Upvotes: 1

Dan Bracuk
Dan Bracuk

Reputation: 20794

I suggest adding numeric primary keys to both tables and a foreign key on country_code in your city table. One of the benefits will be better performance because primary keys are indexed.

Edit starts here

Since the question doesn't ask you to provide the actual ratio, don't worry about trying to calculate it. The city with the highest population in the country will have the highest proportion of the country's population.

Upvotes: 0

Russ Clarke
Russ Clarke

Reputation: 17909

Ideally, I would first start with indexes and consider adding a computed field that pre-calculates the t.population / c.population into a link table

So for each country and city, you can look up it's ratio of population without computing in RBAR.

Upvotes: 0

Related Questions