Gerard
Gerard

Reputation: 4848

Characteristics of slow SQL Queries

Was recently asked in an interview why a SELECT query on a MySQL db would be really slow and came up with the following:

  1. Multiple JOINs being performed on the select
  2. Absence of Indexes (indices?) on key filter fields

Was also asked for solutions for the problems and I said:

  1. Denormalise your data if the query is of high importance (I know this leads to data duplication but is there another way to avoid JOINs?)
  2. Add indexes to the filtering columns.

Are there other characteristics as to why an SQL query would be inefficient? Please note that I'm purely looking for tips on how to speed up the query so assume that the DB server is flawless :-)

Upvotes: 3

Views: 612

Answers (3)

ravnur
ravnur

Reputation: 2852

may be you will find the following link usefull: MySQL-performance-tuning-step-by-step

It describes how to improve MySQL performance for instance, schema, queries and so on...

Upvotes: 0

Chris J
Chris J

Reputation: 9262

There are a few reasons why a query might be slow. In any event, to really see what the query planner is doing, you should run explain on it. The explain command on most DBMSs will tell you what indexes if any the query planner is going to use, about how many rows of data you can expect to get, and how many rows of data need to be processed before you start getting results back.

Now to give some specific reasons why a query might run slow, you are correct about the indexes. The lack of an index will result in sequential scans of the tables you have in your query which can make things slow if those tables are large. Creating indexes on the columns that you are joining on or using in your where clause definitely helps. Sometimes though, the query planner does a poor job and you will need to help it along using the 'force' command to indicate which index it should use.

It is a misconception that joining slows things down. Single level joins are generally fine. For example, you are selecting data from table A and you are joining B to A and C to A. The joins to tables B and C are one level joins. Multi level joins take more time to process. That is why in data warehouses and data marts, people like to use star schemes; that is a single large table with metrics, a fact table, that query is doing on and there are other tables with descriptive data, dimension tables, that get joined to it. Star schemas avoid multi level joins so reporting queries can be performed quickly.

Denormalizing your tables is tempting however, I would strongly recommend against it. When you start to denormalize your database, you are going to encounter some serious pains down the road should the amount of data you are storing grows ever large, scaling problems. As well, maintaining denormalized tables requires your engineers to have a really good working knowledge of the schema which makes it more difficult, tech debt. Sure it is a short term gain but the long term pain means you have to have a really good reason to want to do it. Do a few multi year projects that need to scale and you will really see the pain of denormalization.

Now depending on your needs, it is often times preferable to have a separating reporting database, data mart, or data warehouse that is built and updated using data from your production database. That gives you a lot more freedom to design schemas that really support the reporting queries you want to run and stops you from hacking your production database.

If you are short of resources, a nice alternative to a separate database are temporary tables. A temporary table is a table that exists for the lifetime of your database connection/session. Other connections/sessions cannot see or access it, isolation quality, and you can use them to store and index data that you want to use in a larger much more complex query. Really straightforward to use if you are interacting with the database through a console. If you are working with one programmatically and have a connection pool, I think you might have to drop the table when you are done; can't quite remember but cleaning up is never a bad thing.

One obvious reason why a query is slow is that you are selecting a large amount of data. If you try cross joining multiple tables that each have several hundred million rows of chars(1000) fields, your DBMS might start digging into virtual memory in order to perform the joins. Even with indexes, that can lead to swapping on your disk and once that starts happening, welcome to slowville.

Selecting a subselect (select a, b, (select c, d from e where e.id = a) from f) or using one in your where clause can be really slow too as that subselect is actually a query that gets executed for each row of data. Using a subselect in a join does not suffer from that issue however, you are essentially joining to a temporary table without an index then and depending on how much data you are retrieving with that subselect, that can slow things down too.

The in command can also be problematic if your set is very large. Again, a large set is basically a large temporary table with no index so each time you check to see if a particular value is in your set, you are performing a sequential scan.

Those are the most salient reasons that I can think of right now. There are others but I think that would go beyond the scope of a stack overflow response ;-)

Upvotes: 8

user359040
user359040

Reputation:

Non-sargable queries - ie. the DBMS cannot take advantage of a suitable index even where one exists. Solution - refactor query to be sargable.

Memory-intensive query, requiring disk caching. Solution - upgrade server with extra RAM and faster disk access (faster disks, RAID striping, etc.)

Upvotes: 2

Related Questions