combine multiple select queries into one to avoid multiple pass over a huge table

Question

A very simplified setup of the problem at hand.

Table A has columns rz_id and sHashA. Table A is very big.

Table B has columns scode and sHashB. There can be many sHashB values corresponding to a particular scode value. Table B is relatively much smaller than table A.

For each of the scode value (about 200 of them) I have to execute a query like the following (scode is 500 in this case).

select count(distinct rz_id) from A where substr(sHashA, 1, 5) in (select substr(sHashB, 1, 5) from B where scode = 500);

For each of the scode value I write a query like the above so that I end up with 200 queries like so

select count(distinct rz_id) from A where substr(sHashA, 1, 5) in (select substr(sHashB, 1, 5) from B where scode = 500);
select count(distinct rz_id) from A where substr(sHashA, 1, 5) in (select substr(sHashB, 1, 5) from B where scode = 501);
select count(distinct rz_id) from A where substr(sHashA, 1, 5) in (select substr(sHashB, 1, 5) from B where scode = 502);
.
.
.
select count(distinct rz_id) from A where substr(sHashA, 1, 5) in (select substr(sHashB, 1, 5) from B where scode = 700);

The problem is that this ends up going over the big table 200 times which is time consuming. I want to be able to achieve this with a single pass (single query).

I thought of make a table with as many rows as table A and as many additional columns as table B via a query like

select /*+ streamtable(a) */ a.*, if(substr(sHashA, 1, 5) in (select
substr(sHashB, 1, 5) from B where scode = 500, 1, 0) as scode_500,
if(substr(sHashA, 1, 5) in (select substr(sHashB, 1, 5) from B where
scode = 501, 1, 0) as scode_501, ... if(substr(sHashA, 1, 5) in
(select substr(sHashB, 1, 5) from B where scode = 700, 1, 0) as
scode_700 from A a;

This would output a 0 or 1 in each of the 200 columns corresponding to scode per row of table A. Later I could sum up the columns to get a count. Since I am also interested in estimating the overlap of counts between any two scodes I thought of the above table.

But I get parse error and I suspect queries are not allowed inside of IF statements.

The question finally is this then: how do i reduce all those queries in to a single query so that I end up going through rows of huge table only once? Please also suggest alternate ways of handling this count keeping in mind that I am also intersted in the overlap.

Serif Emek · Accepted Answer

What about something like this;

select count(distinct A.rz_id), B.scode
from A,B
where substr(A.sHashA, 1, 5) = substr(B.sHashB, 1,5)
and B.scode in (500,501,...)
group by B.scode

Single pass gets all data

combine multiple select queries into one to avoid multiple pass over a huge table

Answers (1)

Related Questions