Reputation: 59323
Today I made some changes to a table in an attempt to make certain kinds of queries run much faster. This is the table (before I changed it):
CREATE TABLE IF NOT EXISTS street_addresses (
id INTEGER PRIMARY KEY NOT NULL,
house_number INTEGER NOT NULL,
entrance TEXT NOT NULL,
latitude REAL NOT NULL,
longitude REAL NOT NULL,
street_name INTEGER NOT NULL REFERENCES street_names(id),
postal_code INTEGER NOT NULL REFERENCES postal_codes(id),
city INTEGER NOT NULL REFERENCES cities(id),
municipality INTEGER NOT NULL REFERENCES municipalities(id),
CONSTRAINT unique_address UNIQUE(
street_name, house_number, entrance, postal_code, city
)
)
This table has two indexes (that I can identify): the Primary Key and the unique key across 5 columns. I often need to query for a street address using just the house number and postal code columns, or the house number and city columns, so I changed the table creation SQL to this:
CREATE TABLE IF NOT EXISTS street_addresses (
id INTEGER PRIMARY KEY NOT NULL,
house_number INTEGER NOT NULL,
entrance TEXT NOT NULL,
latitude REAL NOT NULL,
longitude REAL NOT NULL,
street_name INTEGER NOT NULL REFERENCES street_names,
postal_code INTEGER NOT NULL REFERENCES postal_codes,
city INTEGER NOT NULL REFERENCES cities,
municipality INTEGER NOT NULL REFERENCES municipalities
);
CREATE INDEX IF NOT EXISTS sa_hn_pc
ON street_addresses (house_number, postal_code);
CREATE INDEX IF NOT EXISTS sa_hn_ci
ON street_addresses (house_number, city);
CREATE UNIQUE INDEX IF NOT EXISTS sa_unique_address
ON street_addresses (
street_name, house_number, entrance, postal_code, city
);
I added two indexes and moved the UNIQUE index out of the table definition (so that I have all my keys in one place.) Additionally, I removed the (id)
from the REFERENCES
lines, because according to the docs it defaults to using the primary key anyway. My database is now significantly larger, but at least fetching addresses using house number and postal-code is tens of times faster!
Unfortunately, queries searching by street name and house number, which is the most usual kind of query for my database, doesn't appear to be using my indexes anymore. Before the table change I got ~1700 reads per second using street name and house number, now I get ~50. If I search using all 5 columns I still get the good old speed, but using only the first 2 columns in the UNIQUE key is now extremely slow.
Additionally, queries using house number and city is still almost as slow as before, much much slower than searching using house number and postal code.
Any idea how this happened? Will I need to define a new index for street name and house number, even though those columns are part of the UNIQUE key? If so, why were my queries so fast before? Also, why didn't house-number-and-city queries enjoy the same speed boost as house-number-and-postal-code queries did?
Sorry for the wall of text. I hope somebody can help. Here is the select query I'm using:
Before table change:
$ bin/benchmark_norway_database --search-by-components 10000 --street_name --house_number [ ============================ 100% (10000/10000) ============================ ] 5.9129 seconds 0.0006 seconds per interval 1691 intervals per second $ bin/benchmark_norway_database --search-by-components 10000 --street_name --house_number --entrance --postal_code --city [ ============================ 100% (10000/10000) ============================ ] 3.2198 seconds 0.0003 seconds per interval 3106 intervals per second $ bin/benchmark_norway_database --search-by-components 100 --house_number --postal_code [ ============================== 100% (100/100) ============================== ] 9.957 seconds 0.0996 seconds per interval 10 intervals per second $ bin/benchmark_norway_database --search-by-components 100 --house_number --city [ ============================== 100% (100/100) ============================== ] 10.2446 seconds 0.1024 seconds per interval 10 intervals per second
After table change:
# This is now so dreadfully slow I can't do 10000 intervals. $ bin/benchmark_norway_database --search-by-components 500 --street_name --house_number [ ============================== 100% (500/500) ============================== ] 9.5749 seconds 0.0191 seconds per interval 52 intervals per second # Still fast! $ bin/benchmark_norway_database --search-by-components 10000 --street_name --house_number --entrance --postal_code --city [ ============================ 100% (10000/10000) ============================ ] 3.4125 seconds 0.0003 seconds per interval 2930 intervals per second # Much, much faster than before! $ bin/benchmark_norway_database --search-by-components 10000 --house_number --postal_code [ ============================ 100% (10000/10000) ============================ ] 22.2646 seconds 0.0022 seconds per interval 449 intervals per second # Still slow? Why? :S $ bin/benchmark_norway_database --search-by-components 500 --house_number --city [ ============================== 100% (500/500) ============================== ] 14.3483 seconds 0.0287 seconds per interval 35 intervals per second
SELECT
sn.name, sa.house_number, sa.entrance, pc.postal_code,
ci.name, mu.name, co.name, sa.latitude, sa.longitude
FROM
street_addresses AS sa
INNER JOIN street_names AS sn ON sa.street_name = sn.id
INNER JOIN postal_codes AS pc ON sa.postal_code = pc.id
INNER JOIN cities AS ci ON sa.city = ci.id
INNER JOIN municipalities AS mu ON sa.municipality = mu.id
INNER JOIN counties AS co ON mu.county = co.id
WHERE
...
ORDER BY
ci.name ASC, sn.name ASC, sa.house_number ASC, sa.entrance ASC
LIMIT
0, 100
Note: In the WHERE
section, I use GLOB when searching for street names, e.g.:
WHERE
sn.name GLOB "FORNEBUVEIEN" AND
sa.house_number = 11
CREATE TABLE IF NOT EXISTS counties (
id INTEGER PRIMARY KEY NOT NULL,
name TEXT UNIQUE NOT NULL
)
CREATE TABLE IF NOT EXISTS municipalities (
id INTEGER PRIMARY KEY NOT NULL,
name TEXT NOT NULL,
number INTEGER NOT NULL,
county INTEGER NOT NULL REFERENCES counties,
CONSTRAINT unique_municipality UNIQUE(name, county)
);
CREATE UNIQUE INDEX IF NOT EXISTS mu_number
ON municipalities (number);
CREATE UNIQUE INDEX IF NOT EXISTS mu_unique_name_co
ON municipalities (name, county);
CREATE TABLE IF NOT EXISTS cities (
id INTEGER PRIMARY KEY NOT NULL,
name TEXT NOT NULL,
municipality INTEGER NOT NULL REFERENCES municipalities
);
CREATE UNIQUE INDEX IF NOT EXISTS ci_unique_name_mu
ON cities (name, municipality);
CREATE TABLE IF NOT EXISTS postal_codes (
id INTEGER PRIMARY KEY NOT NULL,
postal_code INTEGER NOT NULL,
city INTEGER NOT NULL REFERENCES cities
);
CREATE UNIQUE INDEX IF NOT EXISTS po_postal_code
ON postal_codes (postal_code);
CREATE TABLE IF NOT EXISTS street_names (
id INTEGER PRIMARY KEY NOT NULL,
name TEXT NOT NULL
);
CREATE UNIQUE INDEX IF NOT EXISTS sn_name
ON street_names (name);
CREATE TABLE IF NOT EXISTS street_addresses (
id INTEGER PRIMARY KEY NOT NULL,
house_number INTEGER NOT NULL,
entrance TEXT NOT NULL,
latitude REAL NOT NULL,
longitude REAL NOT NULL,
street_name INTEGER NOT NULL REFERENCES street_names,
postal_code INTEGER NOT NULL REFERENCES postal_codes,
city INTEGER NOT NULL REFERENCES cities,
municipality INTEGER NOT NULL REFERENCES municipalities
);
CREATE INDEX IF NOT EXISTS sa_hn_pc
ON street_addresses (house_number, postal_code);
CREATE INDEX IF NOT EXISTS sa_hn_ci
ON street_addresses (house_number, city);
CREATE UNIQUE INDEX IF NOT EXISTS sa_unique_address
ON street_addresses (
street_name, house_number, entrance, postal_code, city
);
PRAGMA journal_mode = OFF
PRAGMA page_size = 65536
VACUUM
sqlite> EXPLAIN QUERY PLAN SELECT sn.name, sa.house_number, sa.entrance, pc.postal_code, ci.name, mu.name, co.name, sa.latitude, sa.longitude FROM street_addresses AS sa INNER JOIN street_names AS sn ON sa.street_name = sn.id INNER JOIN postal_codes AS pc ON sa.postal_code = pc.id INNER JOIN cities AS ci ON sa.city = ci.id INNER JOIN municipalities AS mu ON sa.municipality = mu.id INNER JOIN counties AS co ON mu.county = co.id WHERE sn.name GLOB "FORNEBUVEIEN" AND sa.house_number=11 ORDER BY ci.name ASC, sn.name ASC, sa.house_number ASC, sa.entrance ASC LIMIT 0, 100; selectid order from detail ---------- ---------- ---------- ------------------------------------------------------------------------- 0 0 0 SEARCH TABLE street_addresses AS sa USING INDEX sa_hn_ci (house_number=?) 0 1 1 SEARCH TABLE street_names AS sn USING INTEGER PRIMARY KEY (rowid=?) 0 2 2 SEARCH TABLE postal_codes AS pc USING INTEGER PRIMARY KEY (rowid=?) 0 3 3 SEARCH TABLE cities AS ci USING INTEGER PRIMARY KEY (rowid=?) 0 4 4 SEARCH TABLE municipalities AS mu USING INTEGER PRIMARY KEY (rowid=?) 0 5 5 SEARCH TABLE counties AS co USING INTEGER PRIMARY KEY (rowid=?) 0 0 0 USE TEMP B-TREE FOR ORDER BY
Upvotes: 3
Views: 202
Reputation: 59323
It turns out that when using a WHERE
section like this in my SELECT
query:
WHERE
sn.name GLOB ? AND
sa.house_number = ?
SQLite3 picks the index sa_hn_ci
(house_number, city) rather than sa_unique_address
. That made the query run approximately 100 times slower.
I am now working around the issue by using INDEXED BY
whenever my query includes a street name:
SELECT
sn.name, sa.house_number, sa.entrance, pc.postal_code,
ci.name, mu.name, co.name, sa.latitude, sa.longitude
FROM
street_addresses AS sa INDEXED BY sa_unique_address -- This line!
INNER JOIN street_names AS sn ON sa.street_name = sn.id
INNER JOIN postal_codes AS pc ON sa.postal_code = pc.id
INNER JOIN cities AS ci ON sa.city = ci.id
INNER JOIN municipalities AS mu ON sa.municipality = mu.id
INNER JOIN counties AS co ON mu.county = co.id
WHERE
sn.name GLOB "FORNEBUVEIEN" AND
sa.house_number=11
ORDER BY
ci.name ASC, sn.name ASC, sa.house_number ASC, sa.entrance ASC
LIMIT
0, 100;
But I don't know why SQLite3 picked the wrong index to begin with. Running ANALYZE
didn't change anything.
I'm not marking this answer as correct.
Upvotes: 0