Forexlead
Forexlead

Reputation: 11

Most Performant SQL

I have a table with 3 columns:

item_id (decimal), key_name (varchar), key_string_value (varchar)

It is populated with hundreds and thousands of rows. Below are the first 6 rows to give you an idea of the data.

 1. 1              product              product1
 2. 1              topic                topic1
 3. 1              segment              segment1
 4. 2              product              product2
 5. 2              topic                topic1
 6. 2              segment              segment1

Each distinct item_id has 3 rows associated with it and describing it with metadata like topic, product, segment. I am trying to write a SQL query which does the following and performs extremely well.

Now I need to select all item_ids which will suffice a certain combination of these metadata for example:

I am trying both self joins and correlated queries but am not sure which direction should I continue for the best performance. Any help would be appreciated.

Upvotes: 1

Views: 133

Answers (3)

Eugen Rieck
Eugen Rieck

Reputation: 65304

First of all: I don't think this has a well-performing solution without an index on item_id, so I think you should change your PK to a composite of (item_id, key_name) - you need this to be unique anyway.

Assuming you have done so, we now have a fast track to both item_id and key_name, which I consider to be a necessity.

We can now try a self join, which should give good results on MySQL, as in InnoDB all composite indices are clustered:

SELECT
  one.item_id 
FROM table_name AS one
INNER JOIN table_name AS two 
  ON two.item_id=one.item_id
INNER JOIN table_name AS three 
  ON three.item_id=one.item_id
WHERE one.key_name='product'
AND one.key_string_value='product1'
AND two.key_name='topic'
AND two.key_string_value='topic1'
AND three.key_name='segment'
AND three.key_string_value='segment1'
;

It is important to apply the selector with the highest selectivity to the driving table. i.e. one - in my query I have assumed, that product has a higher selectivity than topic or segment.

Upvotes: 0

John Bollinger
John Bollinger

Reputation: 181008

Correlated subqueries are usually about the least performant solution to any problem unless the DB can transform them into joins, as some query planners can do in some cases. In any optimization process, though, it is wise to check and compare the plans the DB actually comes up with for different queries and different conditions, using the DB's mechanism for that (e.g. EXPLAIN PLAN in some databases).

My first pass at this task would take this form:

SELECT product.item_id
FROM
  (
    SELECT item_id
    FROM my_table
    WHERE key_name = 'product' AND key_string_value = 'product2'
  ) product
  JOIN (
    SELECT item_id
    FROM my_table
    WHERE key_name = 'topic' AND key_string_value = 'topic1'
  ) topic
    ON product.item_id = topic.item_id
  JOIN (
    SELECT item_id
    FROM my_table
    WHERE key_name = 'segment' AND key_string_value = 'segment1'
  ) segment
    ON topic.item_id = segment.item_id

That assumes the table has a primary key constraint -- or at least a uniqueness constraint -- on (item_id, key_name); if it doesn't then the inline views should use SELECT DISTINCT. Additionally, it may benefit significantly from an index on (key_name), or even more from an index on (key_name, key_string_value).

Before you go crazy creating indexes, though, be sure you understand that though they may speed queries, they do slow inserts, updates, and deletes, and they increase the footprint of your database. And again, check the query plans, and test the performance.

Update: Given that we now find there is indeed an index on (key_name, key_string_value), and that the wide variety of key_string_values makes such an index highly selective, I'm inclined to think that the above approach will do quite well. When you test, don't forget to SELECT DISTINCT if you cannot otherwise rely on the inline views to avoid providing duplicate item_ids.

Upvotes: 1

Hart CO
Hart CO

Reputation: 34784

You can use HAVING and conditional aggregation for this:

SELECT item_id
FROM YourTable
GROUP BY item_id
HAVING MAX(CASE WHEN key_name = 'topic' AND key_string_value ='topic1' THEN 1 END) = 1
  AND  MAX(CASE WHEN key_name = 'product' AND key_string_value ='product2' THEN 1 END) = 1
  AND  MAX(CASE WHEN key_name = 'segment' AND key_string_value ='segment1' THEN 1 END) = 1

Is it the most performant? Will have to test, but I'd expect it to beat self-joins and correlated subqueries.

Upvotes: 1

Related Questions