SQL Query joining inner selects

Question

I am probably doing this the wrong way... but here's what I've got... it's an Oracle instance with a data model like so:

TABLE 1
-----------
ID | NAME |
-----------

TABLE 2
----------------------------------------------
ID | NAME | TABLE_1_ID | TIMESTAMP | SOME_NUM
----------------------------------------------

TABLE 3
-----------------------
ID | NAME | TABLE_1_ID
-----------------------

I'd like a query that gives me 4 columns:

t1.name
a count of all entries in table 2 where SOME_NUM = 1 and t1.id = t2.table_1_id
a count of all entries in table 2 where TIMESTAMP is between two values and t1.id = t2.table_1_id
a count of all entries in table 3 where t1.id = t3.table_1_id

I expect Table 1 to be small compared to the other two tables.

Here's what I did to generate the first two sets:

select t1.name, a.count
from 
   ( select TABLE_1_ID, COUNT(*) count
     from TABLE_2
     group by TABLE_1_ID
   ) a,
table_1 t1
where t1.id = a.table_1_id

This worked... but when I try to get the third column, this fails.

select t1.name, a.count, b.count2
from 
   ( select TABLE_1_ID, COUNT(*) count
     from TABLE_2
     group by TABLE_1_ID
   ) a,
   ( select TABLE_1_ID, COUNT(*) count2
     from TABLE_2
     where TIMESTAMP < ? and TIMESTAMP > ?
     group by TABLE_1_ID
   ) b,

table_1 t1
where t1.id = a.table_1_id and t1.id = b.table_1_id

This doesn't return any entries that do not satisfy the where clause regarding timestamps. If I change the and to an or, I get repeated t1.names.

I'm not very good at SQL as you can tell. I think it's doing an inner join here, but I think I want an outer join since I would like a result set with the same number of rows as TABLE_1. I don't even want to try for the 4th result until understanding how to properly get the third.

RichardTheKiwi · Accepted Answer

There is suggestion that the 2nd form is preferable, but for all things performance, that depends on your data - use what works best for you

Option 1:

Instead of JOINs, simply present them as subqueries that present a scalar value for the columns.

i.e. up to 3rd column

select
   t1.name,
   ( select COUNT(*)
     from TABLE_2 a
     WHERE a.TABLE_1_ID = t1.ID
   ) a_count,
   ( select COUNT(*)
     from TABLE_2 b
     where b.TIMESTAMP < ? and b.TIMESTAMP > ?
       AND t1.id = b.table_1_id
   ) b_count

from table_1 t1

This requires N x C subselects, one for each row in t1 against each column.

That works reasonably if the tables are not too large, or if each subquery has a good index on table_1_id so that the total subquery visits do not exceed the number of rows of a full table scan. Remember that a count(*) query can satisfied directly from index pages without looking up actual table data, so it is quite fast.

Option 2:

Perform a single pass on each table involved, and LEFT JOIN them together.

select t1.name, nvl(a.count1,0) count1, nvl(b.count2,0) count2
from table_1 t1
left join
   ( select TABLE_1_ID, COUNT(*) count1
     from TABLE_2
     group by TABLE_1_ID
   ) a ON t1.id = a.table_1_id
left join
   ( select TABLE_1_ID, COUNT(*) count2
     from TABLE_2
     where TIMESTAMP < ? and TIMESTAMP > ?
     group by TABLE_1_ID
   ) b ON t1.id = b.table_1_id

The left join subselects (grouped by TABLE_1_ID) ensures either 1 row for a TABLE1.ID, or no rows. This prevents Cartesian products occuring. Now you can get the counts, but NVL is added because in this form, it is possible for count to be returned as NULL.

In the first form shown, count is never NULL.

SQL Query joining inner selects

Answers (2)

There is suggestion that the 2nd form is preferable, but for all things performance, that depends on your data - use what works best for you

Option 1:

Option 2:

Related Questions