Reputation: 540
A very simplified setup of the problem at hand.
Table A has columns rz_id and sHashA. Table A is very big.
Table B has columns scode and sHashB. There can be many sHashB values corresponding to a particular scode value. Table B is relatively much smaller than table A.
For each of the scode value (about 200 of them) I have to execute a query like the following (scode is 500 in this case).
select count(distinct rz_id) from A where substr(sHashA, 1, 5) in (select substr(sHashB, 1, 5) from B where scode = 500);
For each of the scode value I write a query like the above so that I end up with 200 queries like so
select count(distinct rz_id) from A where substr(sHashA, 1, 5) in (select substr(sHashB, 1, 5) from B where scode = 500);
select count(distinct rz_id) from A where substr(sHashA, 1, 5) in (select substr(sHashB, 1, 5) from B where scode = 501);
select count(distinct rz_id) from A where substr(sHashA, 1, 5) in (select substr(sHashB, 1, 5) from B where scode = 502);
.
.
.
select count(distinct rz_id) from A where substr(sHashA, 1, 5) in (select substr(sHashB, 1, 5) from B where scode = 700);
The problem is that this ends up going over the big table 200 times which is time consuming. I want to be able to achieve this with a single pass (single query).
I thought of make a table with as many rows as table A and as many additional columns as table B via a query like
select /*+ streamtable(a) */ a.*, if(substr(sHashA, 1, 5) in (select
substr(sHashB, 1, 5) from B where scode = 500, 1, 0) as scode_500,
if(substr(sHashA, 1, 5) in (select substr(sHashB, 1, 5) from B where
scode = 501, 1, 0) as scode_501, ... if(substr(sHashA, 1, 5) in
(select substr(sHashB, 1, 5) from B where scode = 700, 1, 0) as
scode_700 from A a;
This would output a 0 or 1 in each of the 200 columns corresponding to scode per row of table A. Later I could sum up the columns to get a count. Since I am also interested in estimating the overlap of counts between any two scodes I thought of the above table.
But I get parse error and I suspect queries are not allowed inside of IF statements.
The question finally is this then: how do i reduce all those queries in to a single query so that I end up going through rows of huge table only once? Please also suggest alternate ways of handling this count keeping in mind that I am also intersted in the overlap.
Upvotes: 1
Views: 1201
Reputation: 674
What about something like this;
select count(distinct A.rz_id), B.scode
from A,B
where substr(A.sHashA, 1, 5) = substr(B.sHashB, 1,5)
and B.scode in (500,501,...)
group by B.scode
Single pass gets all data
Upvotes: 3