Kapil Sharma
Kapil Sharma

Reputation: 10417

MySQL intersection

I've an existing site, whose DB is not designed correctly and contains lot of records, so we cant change DB structure.

Database for current issue mainly contains 4 tables, users, questions, options and answers. There is standard set of questions and options but for each user, there is one row in answers table for each set of question and options. DB structure and example data is available at SQL fiddle.

Now as a new requirement of advanced search, I need to find users by applying multiple search filters. Example input and expected output is given in comments on SQL Fiddle.

I tried to apply all type of joins, intersection but it always fail somehow. Can someone please help me to write correct query, preferably light weight/optimized joins as DB contain lot of records (10000+ users, 100+ questions, 500+ options and 500000+ records in answers table)?

EDIT: Based on two answers, I used following query

SELECT u.id, u.first_name, u.last_name
FROM users u
    JOIN answers a ON a.user_id = u.id
WHERE (a.question_id = 1 AND a.option_id IN (3, 5))
    OR (a.question_id = 2 AND a.option_id IN (8))
GROUP BY u.id, u.first_name, u.last_name
HAVING
    SUM(CASE WHEN (a.question_id = 1 AND a.option_id IN (3, 5)) THEN 1 ELSE 0 END) >=1
    AND SUM(CASE WHEN (a.question_id = 2 AND a.option_id IN (8)) THEN 1 ELSE 0 END) >= 1;

Please note: On real database, columns user_id, question_id and option_id of answers table are indexed.

Running query given on SQL Fiddle.

SQL Fiddle for dnoeth's answer.

SQL Foddle for calcinai's answer.

Upvotes: 1

Views: 273

Answers (2)

dnoeth
dnoeth

Reputation: 60472

Add all you n filters into the WHERE using OR and repeat them in a HAVING(SUM(CASE)) using AND:

SELECT u.id, u.first_name, u.last_name
FROM users u JOIN answers a
  ON a.user_id = u.id
JOIN questions q
  ON a.question_id = q.id
JOIN question_options o
  ON a.option_id = o.id
WHERE (q.question = 'Language known' AND o.OPTION IN ('French','Russian'))
   OR (q.question = 'height' AND o.OPTION = '1.51 - 1.7')
GROUP BY u.id, u.first_name, u.last_name
HAVING
  SUM(CASE WHEN (q.question = 'Language known' AND o.OPTION IN ('French','Russian')) THEN 1 ELSE 0 END) >=1
AND 
  SUM(CASE WHEN (q.question = 'height'         AND o.OPTION = '1.51 - 1.7')          THEN 1 ELSE 0 END) >= 1
;

I changed your joins into the more readable Standard SQL syntax.

Upvotes: 1

calcinai
calcinai

Reputation: 2617

This will require a bit of fiddling for a dynamic filter, but what you really want to do is search by the IDs, as it'll mean less joins and a faster query.

This produces the results you'd expect. I assume that the search filters are generated based off options in the database, so instead of passing the actual value back in to the query, pass the ID instead.

The multiple inner joins are to support multiple AND criteria and auto-reduce your result set.

SELECT * FROM users u
INNER JOIN answers a ON a.user_id = u.id
  AND (a.question_id, a.option_id) IN ((1,3),(1,5)) # q 1: Lang, answer 3/5: En/Ru
INNER JOIN answers a2 ON a2.user_id = u.id
  AND (a2.question_id, a2.option_id) = (2,8) # q 2: Height, answer 8: 1.71...
GROUP BY u.id;

I'd suggest making sure there's an index on (user_id, question_id, option_id) for searching:

ALTER TABLE `answers` ADD INDEX idx_search(`user_id`, `question_id`, `option_id`);

Otherwise it should be using primary keys for the joins (if properly defined) so it will be fast.

Upvotes: 1

Related Questions