Selecting rows with multiple values from other rows with mySQL

Question

Say I have the following table, named data:

ID   foo1     foo2    foo3
1    11       22      33
2    22       17      92
3    31       33      53
4    53       22      11
5    43       23      9

I want to select all rows where either foo1, foo2 or foo3 match either of these columns in the first row. That is, I want all rows where at least one of the foos appears also in the first row. In the example above, I want to select rows 1, 2, 3 and 4. I thought that I could use something like

SELECT * FROM data WHERE foo1 IN (SELECT foo1,foo2,foo3 FROM data WHERE ID=1)
                      OR foo2 IN (SELECT foo1,foo2,foo3 FROM data WHERE ID=1)
                      OR foo3 IN (SELECT foo1,foo2,foo3 FROM data WHERE ID=1)

but this does not seem to work. I can, of course, use

WHERE foo1=(SELECT foo1 FROM data WHERE ID=1) 
   OR foo1=(SELECT foo2 FROM data WHERE ID=1) 
   OR ...

but that would invlove many lines, and in my real data set there are actually 16 columns, so it will really be a pain in the lower back. Is there a more sophisticated way to do so?

Also, what should I do if I want to count also the number of hits (in the example above, get 4 for row 1, 2 for row 4, and 1 for rows 2,3)?

eggyal · Accepted Answer

SELECT data.*,
      (data.foo1 IN (t.foo1, t.foo2, t.foo3))
    + (data.foo2 IN (t.foo1, t.foo2, t.foo3))
    + (data.foo3 IN (t.foo1, t.foo2, t.foo3)) AS number_of_hits
FROM   data JOIN data t ON t.id = 1
WHERE  data.foo1 IN (t.foo1, t.foo2, t.foo3)
    OR data.foo2 IN (t.foo1, t.foo2, t.foo3)
    OR data.foo3 IN (t.foo1, t.foo2, t.foo3)

See it on sqlfiddle.

Actually, on reflection, you might consider normalising your data:

CREATE TABLE data_new (
  ID         BIGINT  UNSIGNED NOT NULL,
  foo_number TINYINT UNSIGNED NOT NULL,
  val        INT,
  PRIMARY KEY (ID, foo_number),
  INDEX (val)
);

INSERT INTO data_new
  (ID, foo_number, val)
          SELECT ID, 1, foo1 FROM data
UNION ALL SELECT ID, 2, foo2 FROM data
UNION ALL SELECT ID, 3, foo3 FROM data;

DROP TABLE data;

Then you can do:

SELECT   ID,
         MAX(IF(foo_number=1,val,NULL)) AS foo1,
         MAX(IF(foo_number=2,val,NULL)) AS foo2,
         MAX(IF(foo_number=3,val,NULL)) AS foo3,
         number_of_hits
FROM     data_new JOIN (
  SELECT   d1.ID, COUNT(*) AS number_of_hits
  FROM     data_new d1 JOIN data_new d2 USING (val)
  WHERE    d2.ID = 1
  GROUP BY d1.ID
) t USING (ID)
GROUP BY ID

See it on sqlfiddle.

As you can see from the execution plan, this will be considerably more efficient for large data sets.

Selecting rows with multiple values from other rows with mySQL

Answers (2)

Related Questions