yohai
yohai

Reputation: 478

Selecting rows with multiple values from other rows with mySQL

Say I have the following table, named data:

ID   foo1     foo2    foo3
1    11       22      33
2    22       17      92
3    31       33      53
4    53       22      11
5    43       23      9

I want to select all rows where either foo1, foo2 or foo3 match either of these columns in the first row. That is, I want all rows where at least one of the foos appears also in the first row. In the example above, I want to select rows 1, 2, 3 and 4. I thought that I could use something like

SELECT * FROM data WHERE foo1 IN (SELECT foo1,foo2,foo3 FROM data WHERE ID=1)
                      OR foo2 IN (SELECT foo1,foo2,foo3 FROM data WHERE ID=1)
                      OR foo3 IN (SELECT foo1,foo2,foo3 FROM data WHERE ID=1)

but this does not seem to work. I can, of course, use

WHERE foo1=(SELECT foo1 FROM data WHERE ID=1) 
   OR foo1=(SELECT foo2 FROM data WHERE ID=1) 
   OR ...

but that would invlove many lines, and in my real data set there are actually 16 columns, so it will really be a pain in the lower back. Is there a more sophisticated way to do so?

Also, what should I do if I want to count also the number of hits (in the example above, get 4 for row 1, 2 for row 4, and 1 for rows 2,3)?

Upvotes: 1

Views: 1083

Answers (2)

eggyal
eggyal

Reputation: 125835

SELECT data.*,
      (data.foo1 IN (t.foo1, t.foo2, t.foo3))
    + (data.foo2 IN (t.foo1, t.foo2, t.foo3))
    + (data.foo3 IN (t.foo1, t.foo2, t.foo3)) AS number_of_hits
FROM   data JOIN data t ON t.id = 1
WHERE  data.foo1 IN (t.foo1, t.foo2, t.foo3)
    OR data.foo2 IN (t.foo1, t.foo2, t.foo3)
    OR data.foo3 IN (t.foo1, t.foo2, t.foo3)

See it on sqlfiddle.

Actually, on reflection, you might consider normalising your data:

CREATE TABLE data_new (
  ID         BIGINT  UNSIGNED NOT NULL,
  foo_number TINYINT UNSIGNED NOT NULL,
  val        INT,
  PRIMARY KEY (ID, foo_number),
  INDEX (val)
);

INSERT INTO data_new
  (ID, foo_number, val)
          SELECT ID, 1, foo1 FROM data
UNION ALL SELECT ID, 2, foo2 FROM data
UNION ALL SELECT ID, 3, foo3 FROM data;

DROP TABLE data;

Then you can do:

SELECT   ID,
         MAX(IF(foo_number=1,val,NULL)) AS foo1,
         MAX(IF(foo_number=2,val,NULL)) AS foo2,
         MAX(IF(foo_number=3,val,NULL)) AS foo3,
         number_of_hits
FROM     data_new JOIN (
  SELECT   d1.ID, COUNT(*) AS number_of_hits
  FROM     data_new d1 JOIN data_new d2 USING (val)
  WHERE    d2.ID = 1
  GROUP BY d1.ID
) t USING (ID)
GROUP BY ID

See it on sqlfiddle.

As you can see from the execution plan, this will be considerably more efficient for large data sets.

Upvotes: 3

spencer7593
spencer7593

Reputation: 108370

There are several ways to get the result set.

Here's one approach, (if you don't care about which fooN gets matched with with fooN, and also want to return that "first" row).

SELECT DISTINCT d.* 
  JOIN ( SELECT foo1 AS foo FROM data WHERE id = 1
          UNION ALL
         SELECT foo2 FROM data WHERE id = 1
          UNION ALL
         SELECT foo3 FROM data WHERE id = 1
       ) f
  JOIN data d
    ON  f.foo IN (d.foo1, d.foo2, d.foo3)

That ON clause could also be written like this:

    ON d.foo1 = f.foo
    OR d.foo2 = f.foo
    OR d.foo2 = f.foo

To get a "count" of the hits...

SELECT d.id
     , d.foo1
     , d.foo2
     , d.foo3
     , SUM( IFNULL(d.foo1=f.foo,0)
           +IFNULL(d.foo2=f.foo,0)
           +IFNULL(d.foo3=f.foo,0)
       ) AS count_of_hits
  JOIN ( SELECT foo1 AS foo FROM data WHERE id = 1
          UNION ALL
         SELECT foo2 FROM data WHERE id = 1
          UNION ALL
         SELECT foo3 FROM data WHERE id = 1
       ) f
  JOIN data d
    ON  f.foo IN (d.foo1, d.foo2, d.foo3)
 GROUP
    BY d.id
     , d.foo1
     , d.foo2
     , d.foo3


eggyal is right, as usual. Getting the count of hits is actually much simpler: we can just use a SUM(1) or COUNT(1) aggregate, no need to run all those comparisons, we've already done all the necessary comparisons.

SELECT d.id
     , d.foo1
     , d.foo2
     , d.foo3
     , COUNT(1) AS count_of_hits
  JOIN ( SELECT foo1 AS foo FROM data WHERE id = 1
          UNION ALL
         SELECT foo2 FROM data WHERE id = 1
          UNION ALL
         SELECT foo3 FROM data WHERE id = 1
       ) f
  JOIN data d
    ON  f.foo IN (d.foo1, d.foo2, d.foo3)
 GROUP
    BY d.id
     , d.foo1
     , d.foo2
     , d.foo3

Upvotes: 1

Related Questions