Reputation: 3
I have two tables. Table one with 3 category columns. Table 2 with 2 category columns. I tried to write a query that compare these category columns and shows me the data which is in one of the three columns of table 1, but not in one of the two columns of table 2.
e.g.
ID product cat1 cat2 cat3 10001 product a A B C 10001 product a D E NULL 10001 product a F G H 10002 product b B C D ... ... ... ... ...
ID product cat1 cat2 10001 product a D E 10001 product a D F 10001 product a G A 10002 product b A C ... ... ... ... ...
The output of the query should look something like that:
product_id not_in_cat 10001 B 10001 C 10001 H 10002 B 10002 D
But I don't know how I can realize it. I tried to do this with a subquery and the "NOT IN" command. But this way I got many subqueries, for each combination of t1.cat1 - t2.cat1, t1.cat2 - t2.cat2, and so on. And this way I got only the categories which are not in the same row.
Maybe someone can help me to figure out, which way is the best to achieve the desired result. The query is performed on a MS SQL Server with a openquery to a MySQL and a oracle db server. But I think its more a logic problem.
Upvotes: 0
Views: 2197
Reputation: 1269773
I think you should "normalize" the data and do the equivalent of a full outer join using union all
and group by
. This looks like:
select id, cat
from (select id, cat1 as cat, 1 as which from table1 union all
select id, cat2 as cat, 1 as which from table1 union all
select id, cat3 as cat, 1 as which from table1 union all
select id, cat1 as cat, 2 as which from table2 union all
select id, cat2 as cat, 2 as which from table2
) ic
where cat is not null
group by id, cat
having min(which) = max(which);
This finds the categories that are in only one of the tables. Note: If you know that there are no duplicates in either table, then having count(*) = 1
also works.
And, if you want to know the table where the category is present, then ask another question. That is a bit different from the question that you did ask.
Upvotes: 1