Reputation: 478
Say I have the following table, named data
:
ID foo1 foo2 foo3
1 11 22 33
2 22 17 92
3 31 33 53
4 53 22 11
5 43 23 9
I want to select all rows where either foo1
, foo2
or foo3
match either of these columns in the first row. That is, I want all rows where at least one of the foo
s appears also in the first row. In the example above, I want to select rows 1, 2, 3 and 4. I thought that I could use something like
SELECT * FROM data WHERE foo1 IN (SELECT foo1,foo2,foo3 FROM data WHERE ID=1)
OR foo2 IN (SELECT foo1,foo2,foo3 FROM data WHERE ID=1)
OR foo3 IN (SELECT foo1,foo2,foo3 FROM data WHERE ID=1)
but this does not seem to work. I can, of course, use
WHERE foo1=(SELECT foo1 FROM data WHERE ID=1)
OR foo1=(SELECT foo2 FROM data WHERE ID=1)
OR ...
but that would invlove many lines, and in my real data set there are actually 16 columns, so it will really be a pain in the lower back. Is there a more sophisticated way to do so?
Also, what should I do if I want to count also the number of hits (in the example above, get 4 for row 1, 2 for row 4, and 1 for rows 2,3)?
Upvotes: 1
Views: 1083
Reputation: 125835
SELECT data.*,
(data.foo1 IN (t.foo1, t.foo2, t.foo3))
+ (data.foo2 IN (t.foo1, t.foo2, t.foo3))
+ (data.foo3 IN (t.foo1, t.foo2, t.foo3)) AS number_of_hits
FROM data JOIN data t ON t.id = 1
WHERE data.foo1 IN (t.foo1, t.foo2, t.foo3)
OR data.foo2 IN (t.foo1, t.foo2, t.foo3)
OR data.foo3 IN (t.foo1, t.foo2, t.foo3)
See it on sqlfiddle.
Actually, on reflection, you might consider normalising your data:
CREATE TABLE data_new (
ID BIGINT UNSIGNED NOT NULL,
foo_number TINYINT UNSIGNED NOT NULL,
val INT,
PRIMARY KEY (ID, foo_number),
INDEX (val)
);
INSERT INTO data_new
(ID, foo_number, val)
SELECT ID, 1, foo1 FROM data
UNION ALL SELECT ID, 2, foo2 FROM data
UNION ALL SELECT ID, 3, foo3 FROM data;
DROP TABLE data;
Then you can do:
SELECT ID,
MAX(IF(foo_number=1,val,NULL)) AS foo1,
MAX(IF(foo_number=2,val,NULL)) AS foo2,
MAX(IF(foo_number=3,val,NULL)) AS foo3,
number_of_hits
FROM data_new JOIN (
SELECT d1.ID, COUNT(*) AS number_of_hits
FROM data_new d1 JOIN data_new d2 USING (val)
WHERE d2.ID = 1
GROUP BY d1.ID
) t USING (ID)
GROUP BY ID
See it on sqlfiddle.
As you can see from the execution plan, this will be considerably more efficient for large data sets.
Upvotes: 3
Reputation: 108370
There are several ways to get the result set.
Here's one approach, (if you don't care about which fooN gets matched with with fooN, and also want to return that "first" row).
SELECT DISTINCT d.*
JOIN ( SELECT foo1 AS foo FROM data WHERE id = 1
UNION ALL
SELECT foo2 FROM data WHERE id = 1
UNION ALL
SELECT foo3 FROM data WHERE id = 1
) f
JOIN data d
ON f.foo IN (d.foo1, d.foo2, d.foo3)
That ON clause could also be written like this:
ON d.foo1 = f.foo
OR d.foo2 = f.foo
OR d.foo2 = f.foo
To get a "count" of the hits...
SELECT d.id
, d.foo1
, d.foo2
, d.foo3
, SUM( IFNULL(d.foo1=f.foo,0)
+IFNULL(d.foo2=f.foo,0)
+IFNULL(d.foo3=f.foo,0)
) AS count_of_hits
JOIN ( SELECT foo1 AS foo FROM data WHERE id = 1
UNION ALL
SELECT foo2 FROM data WHERE id = 1
UNION ALL
SELECT foo3 FROM data WHERE id = 1
) f
JOIN data d
ON f.foo IN (d.foo1, d.foo2, d.foo3)
GROUP
BY d.id
, d.foo1
, d.foo2
, d.foo3
eggyal is right, as usual. Getting the count of hits is actually much simpler: we can just use a SUM(1) or COUNT(1) aggregate, no need to run all those comparisons, we've already done all the necessary comparisons.
SELECT d.id
, d.foo1
, d.foo2
, d.foo3
, COUNT(1) AS count_of_hits
JOIN ( SELECT foo1 AS foo FROM data WHERE id = 1
UNION ALL
SELECT foo2 FROM data WHERE id = 1
UNION ALL
SELECT foo3 FROM data WHERE id = 1
) f
JOIN data d
ON f.foo IN (d.foo1, d.foo2, d.foo3)
GROUP
BY d.id
, d.foo1
, d.foo2
, d.foo3
Upvotes: 1