Roger
Roger

Reputation:

How can I speed up this SELECT CONCAT/GROUP BY query?

I'm working on selecting locations (city, state) out of a database. The problem is that the query is running a tad slow and I'm not sure how to speed it up. For example:

SELECT CONCAT_WS(', ', city, state) as location, AVG(latitude), AVG(longitude) 
FROM places
WHERE city='New York' AND state='NY'
GROUP BY location

There's going to be a CONCAT on the location regardless, because I want the database to return a pretty, concatenated version of the location (unless there's reason to do this in the code instead). For example, "New York, NY". In reality, a third column is sometimes thrown into the mix (zipcode). I'm running on MySQL.

What would be the best way to optimize this query?

Also, as a secondary question, would adding "DISTINCT" slow down the query in any way? For example:

SELECT DISTINCT CONCAT_WS(', ', city, state) as location, AVG(latitude), AVG(longitude) 
FROM places
WHERE city='New York' AND state='NY'
GROUP BY location

(I'm currently doing this now, but in the process of asking this question, I realized that DISTINCT was not necessary due to the GROUP BY clause; however, since it is unnecessary I wonder if it makes any difference at all and if I should bother rocking the boat in order to speed up the query.)

Edit: There's already an index on city, state and zipcode; plus their combinations therein (city, zipcode; and state/zipcode alone).

Upvotes: 1

Views: 3872

Answers (4)

paxdiablo
paxdiablo

Reputation: 881973

It's funny, but almost every problem people have with databases is speed, not storage requirements. That should tell you something :-)

We've had problems like this before and I've said this many times: per-row functions generally do not scale well. The best method we've found to fix them is with insert/update triggers (I'm assuming MySQL has these).

Create another column call pretty_city_state (or whatever) and have the triggers populate it from city and state whenever you insert or update a row. Then create an index on it.

This takes advantage of the fact that database rows are generally read far more often than they're written (especially in this case). By evaluating that column on write, you bear the cost across the writes (thousands) rather than the reads (millions probably). And it's the writing when it should be borne simply because pretty_city_state will only change when either city or state changes. If you do the concat on every select, you're wasting effort.

Try that and measure the difference - I'm sure you'll find that your selects will scream along at a minimal cost for the triggers (and that cost disappears totally once you have all cities and states within your database.

And yes, I know this breaks 3NF. It's perfectly acceptable to do so for performance reasons if you know what you're doing.

Your query could be done as:

SELECT pretty_city_state as location, AVG(latitude), AVG(longitude) 
FROM places
WHERE city='New York' AND state='NY'
GROUP BY pretty_city_state

or, maybe even faster (measure, don't guess) if you can concatenate the city and state before starting the query:

SELECT pretty_city_state as location, AVG(latitude), AVG(longitude) 
FROM places
WHERE pretty_city_state ='New York, NY'
GROUP BY pretty_city_state

Upvotes: 2

Nicolas
Nicolas

Reputation: 2186

Adding an index on fields "city" and "state" will help.

Also, depending on the cardinality of each field (number of distinct values), on the version of MySQL, the table engine and other parameters, inverting the WHERE clauses may have an impact on the execution time of your query. I'd try:

WHERE state='NY' AND city='New York'

Upvotes: 0

Quassnoi
Quassnoi

Reputation: 425593

Create a composite index on (state, city) and rewrite your query as this:

SELECT  CONCAT_WS(', ', city, state) AS location, AVG(latitude), AVG(longitude) 
FROM    places
WHERE   state='NY'
        AND city='New York'
GROUP BY
        state, city

Note that for this very query you may omit GROUP BY clause:

SELECT  'New York, NY' AS location, AVG(latitude), AVG(longitude) 
FROM    places
WHERE   state='NY'
        AND city='New York'

However, this query will still need it:

SELECT  CONCAT_WS(', ', city, state) AS location, AVG(latitude), AVG(longitude) 
FROM    places
WHERE   state='NY'
GROUP BY
        state, city

Upvotes: 4

Daniel A. White
Daniel A. White

Reputation: 190966

One way to best optimize a query like this is to have those columns set as Index columns. That way it can easily sort/group based off of a tree or hash. Also, concatenation of strings might have some implications too.

Upvotes: 0

Related Questions