Reputation: 29803
Let's assume I have a table named customer
like this:
+----+------+----------+-----+
| id | name | lastname | age |
+----+------+----------+-----+
| .. | ... | .... | ... |
and I need to perform the following query:
SELECT * FROM customer WHERE ((name = 'john' OR lastname = 'doe') AND age = 21)
I'm aware of how single and multi-column indexes work, so I created these ones:
(name, age)
(lastname, age)
Is that all the indexes I need?
The above condition can be rephrased as:
... WHERE ((name = 'john' AND age = 21) OR (lastname = 'doe' AND age = 21)
but I'm not sure how smart RDBMS are, and if those indexes are the correct ones
Upvotes: 1
Views: 92
Reputation: 656241
Your approach is reasonable. Two factors are essential here:
Postgres can combine multiple indexes very efficiently with bitmap index scans.
B-tree index usage is by far most effective when only leading columns of the index are involved.
If you don't have enough data to measure tests
, you can always whip up a quick test case like this:
CREATE TABLE customer (id int, name text, lastname text, age int);
INSERT INTO customer
SELECT g
, left(md5('foo'::text || g%500) , 3 + ((g%5)^2)::int)
, left(md5('bar'::text || g%1000), 5 + ((g%5)^2)::int)
, ((random()^2) * 100)::int
FROM generate_series(1, 30000) g; -- 30k rows for quick test case
For your query (reformatted):
SELECT *
FROM customer
WHERE (name = 'john' OR lastname = 'doe')
AND age = 21;
I would go with
CREATE INDEX customer_age_name_idx ON customer (age, name);
CREATE INDEX customer_age_lastname_idx ON customer (age, lastname);
However, depending on many factors, a single index with all three columns and age as first may be able to deliver similar performance. The rule of thumb is to create as few indexes as possible and as many as necessary.
CREATE INDEX customer_age_lastname_name_idx ON customer (age, lastname, name);
The check on (age, name)
is potentially slower in this case, but depending on selectivity of the first column it may not matter much.
age
first in the index?This is not very important and needs deeper understanding to explain. But since you ask ...
The order of columns doesn't matter for the 2-column indexes customer_age_name_idx
and customer_age_lastname_idx
. Details and a test-case:
I still put age
first to stay consistent with the 3rd index I suggested customer_age_lastname_name_idx
, where the order of columns does matter in multiple ways:
Most importantly, both your predicates (age, name)
and (age, lastname)
share the column age
. B-tree indexes are (by far) most effective on leading columns, so putting age
first benefits both.
And, less importantly, but still relevant: the size of the index is smaller this way due to data type characteristics, alignment, padding and page layout of index pages.
age
is a 4-byte integer
and must be aligned at multiples of 4 bytes in the data page. text
is of variable length and has no alignment restrictions. Putting the integer first or last is more efficient due to the rules of "column tetris". I added another index on (lastname, age, name)
(age
in the middle!) to the fiddle just to demonstrate it's ~ 10 % bigger. No space lost to additional padding, which results in a smaller index. And size matters.
For the same reasons it would be better to reorder columns in the demo table like this: (id, age, name, lastname)
. If you want to learn why, start here:
Everything I wrote is for the case at hand. If you have other queries / other requirements, the resulting strategy may change.
UNION
query equivalent?Note that a UNION
query may or may not return the same result. It folds duplicate rows, which your original does not. Even if you don't have complete duplicates in your table, you may still see this effect with a subset of columns in the SELECT
list. Do not blindly substitute with a UNION
query. It's not going to be faster anyway.
Upvotes: 1
Reputation: 14525
Turn the OR into two queries UNIONed:
SELECT * FROM Customer WHERE Age = 21 AND Name = 'John'
UNION
SELECT * FROM Customer WHERE Age = 21 AND LastName = 'Doe'
Then create an index over (Age, Name) and another over (Age, LastName).
Upvotes: 0