Reputation: 19896
I have three tables and I would like to query table3
using conditions based on table1
and table2
. Here is the simplified version of the data and queries:
CREATE TABLE table1 (
id int
);
INSERT INTO table1 VALUES(1);
INSERT INTO table1 VALUES(2);
INSERT INTO table1 VALUES(3);
+------------+--+
| table1.id |
+------------+--+
| 1 |
| 2 |
| 3 |
+------------+--+
CREATE TABLE table2 (
code varchar(10)
);
INSERT INTO table2 VALUES('a');
INSERT INTO table2 VALUES('b');
INSERT INTO table2 VALUES('c');
+--------------+--+
| table2.code |
+--------------+--+
| a |
| b |
| c |
+--------------+--+
CREATE TABLE table3 (
id int,
code varchar(10)
);
INSERT INTO table3 VALUES(1,'d');
INSERT INTO table3 VALUES(1,'a');
INSERT INTO table3 VALUES(2,'b');
INSERT INTO table3 VALUES(2,'e');
INSERT INTO table3 VALUES(4,'a');
INSERT INTO table3 VALUES(4,'d');
+------------+--------------+--+
| table3.id | table3.code |
+------------+--------------+--+
| 1 | d |
| 1 | a |
| 2 | b |
| 2 | e |
| 4 | a |
| 4 | d |
+------------+--------------+--+
Basically, I like to get the records from table3
only if id
exists in table1
and code
not exists in table2
. So the result should be just
1,d
2,e
This below query doesn't work:
SELECT * FROM table3 WHERE (table3.id IN (SELECT table1.id FROM
table1)) AND NOT (table3.code IN (SELECT table2.code FROM table2));
I got this error:
Error: Error while compiling statement: FAILED: SemanticException [Error 10249]: Line 1:94 Unsupported SubQuery Expression 'code': Only 1 SubQuery expression is supported. (state=42000,code=10249)
Independently, each condition works fine:
SELECT * FROM table3 WHERE (table3.id IN (SELECT table1.id FROM table1));
+------------+--------------+--+
| table3.id | table3.code |
+------------+--------------+--+
| 1 | d |
| 1 | a |
| 2 | b |
| 2 | e |
+------------+--------------+--+
SELECT * FROM table3 WHERE NOT (table3.code IN (SELECT table2.code FROM table2));
+------------+--------------+--+
| table3.id | table3.code |
+------------+--------------+--+
| 1 | d |
| 2 | e |
| 4 | d |
+------------+--------------+--+
IMPORTANT: I cannot do JOIN
or modify anything in FROM
as this is a part of a reporting system so the only thing I can do is to tweak WHERE clause.
Upvotes: 4
Views: 8956
Reputation: 1269463
NOT IN
is a simpler way to write your query:
SELECT t3.*
FROM table3 t3
WHERE t3.id IN (SELECT table1.id FROM table1) AND
t3.code NOT IN (SELECT table2.code FROM table2);
If you are limited to one subquery, this gets tricky. Here is one possibility, although I'm not sure if hive will accept it:
where exists (select 1
from table1 t1
where t1.id = t3.id and
not exists (select 1
from table2 t2
where t2.code = t3.code
)
)
You can do this without the double nesting:
where exists (select 1
from table1 t1 left join
table2 t2
on t2.code = t3.code
where t1.id = t3.id
)
Upvotes: 0
Reputation: 175566
You could rewrite it using JOIN
s:
SELECT DISTINCT t3.*
FROM table3 t3
JOIN table1 t1
ON t3.id = t1.id
LEFT JOIN table2 t2
ON t2.code = t3.code
WHERE t2.code IS NULL;
the only thing I can do is to tweak WHERE clause.
SELECT *
FROM Table3 t
WHERE EXISTS (SELECT 1
FROM table3 t3
JOIN table1 t1
ON t3.id = t1.id
LEFT JOIN table2 t2
ON t2.code = t3.code
WHERE t2.code IS NULL
AND t3.id = t.id
AND t3.code = t.code)
Upvotes: 2
Reputation: 311018
One dirty trick you can use is to cross join table1
and table2
, since you don't care about their relationship anyway, and use an exists
condition:
SELECT *
FROM table3
WHERE NOT EXISTS (SELECT *
FROM table1
CROSS JOIN table2
WHERE table3.id = table1.id ON table3.code = table2.code)
EDIT:
While the above query should work, its performance probably won't be great. A slightly faster variant would be to use union all
in the subquery:
SELECT *
FROM table3
WHERE NOT EXISTS (SELECT *
FROM table1
WHERE table3.id = table1.id
UNION ALL
SELECT *
FROM table2
WHERE table3.code = table2.code)
Upvotes: 1