Ashok Dongare
Ashok Dongare

Reputation: 541

Work around for intersect keyword in mysql

I have requirement of getting intersection of some results in mysql DB. But after googling came to know that there is no mysql intersect keyword available . Following are my sample tables.

gene table
+------+--------+---------+
| id   | symbol | test_id |
+------+--------+---------+
|   -1 | A      |      -1 |
|    8 | A      |       3 |
|    9 | G      |       3 |
|   -1 | A      |      -1 |
|   -2 | B      |      -1 |
|   -3 | C      |      -1 |
|    1 | A      |       1 |
|    2 | B      |       1 |
|    3 | C      |       1 |
|    4 | B      |       2 |
|    5 | C      |       2 |
|    6 | D      |       2 |
|    7 | E      |       2 |
|    8 | A      |       3 |
|    9 | G      |       3 |
|   10 | F      |       3 |
|   11 | C      |       3 |
|   12 | C      |       4 |
|   13 | G      |       4 |
|   14 | F      |       4 |
|   15 | M      |       4 |
|   16 | N      |       4 |
+------+--------+---------+

test table
+------+-------+
| id   | name  |
+------+-------+
|   -1 | test0 |
|    3 | test3 |
|   -1 | test0 |
|    1 | test1 |
|    2 | test2 |
|    3 | test3 |
|    4 | test4 |
+------+-------+

Now I want to formulate a query which will give me the tests which are common for provided genes. e.g. I will provide gene A, B, C and I should get the following result:

 id    name   id    symbol 
---------------------------    
 -1 | test0 |   -1 | A
 -1 | test0 |   -2 | B
 -1 | test0 |   -3 | C
  1 | test1 |    1 | A
  1 | test1 |    2 | B
  1 | test1 |    3 | C

I just tried to form a query by following way but didn't work, getting empty resultset and if I use 'or' in where clause getting tests for all genes in where clause.

select distinct t.id, t.name, g.id, g.symbol from tests t 
join genes g on t.id = g.test_id 
where g.symbol = 'A' and g.symbol='B' and g.symbol='C';

Please help me to construct the query.

Upvotes: 0

Views: 97

Answers (2)

GarethD
GarethD

Reputation: 69759

The trick is to filter the records with your criteria, then group by test.id to check that it matches all the criteria:

SELECT  t.id
FROM    tests AS t
        INNER JOIN genes AS g
            ON t.id = g.test_id 
WHERE   g.symbol in ('A','B','C')
GROUP BY t.id
HAVING COUNT(DISTINCT g.symbol) = 3;

So the key line is here:

HAVING COUNT(DISTINCT g.symbol) = 3;

If, like test 2, there is only a match on 'B', then the count will return 1 and the test will be excluded. The number of items you are checking for must match the number in the HAVING clause.

If you then need to get the full data out, you just need to join back to your table:

SELECT  t.id, t.name, g.id, g.symbol
FROM    genes AS g
        INNER JOIN
        (   SELECT  t.id, t.name
            FROM    tests AS t
                    INNER JOIN genes AS g
                        ON t.id = g.test_id 
            WHERE   g.symbol in ('A','B','C')
            GROUP BY t.id, t.name
            HAVING COUNT(DISTINCT g.symbol) = 3
        ) t
            ON t.id = g.test_id;

Example on SQL Fiddle

Upvotes: 3

Rahul
Rahul

Reputation: 77866

Change those AND conditions to OR condition like below cause at any point in time g.symbol can hold only one value and not multiple value. that's why you are getting empty result set.

select t.id, t.name, g.id, g.symbol from tests t 
join genes g on t.id = g.test_id 
where (g.symbol = 'A' or g.symbol='B' or g.symbol='C')
and g.test_id = 1;

(OR) use a IN operator like

select t.id, t.name, g.id, g.symbol from tests t 
join genes g on t.id = g.test_id 
where g.symbol in ('A','B','C')
and g.test_id = 1;

Upvotes: 0

Related Questions