aquitted-mind
aquitted-mind

Reputation: 303

Join across multiple tables with partial counts

Each company, has products and each product has entries in detail1, detail2, detail3 table.

**Table Company**
cid |   cname   
-----+-----------
100 | Company 1
101 | Company 2

**Table Product**
pid  | cid |   dname   
------+-----+-----------
1000 | 100 | Product A
2000 | 101 | Product B

**Table detail1**
pid  | state |          datetime          
------+-------+----------------------------
1000 | A     | 2013-06-03 11:49:49.224992
1000 | B     | 2013-06-03 11:49:49.226124
1000 | B     | 2013-06-03 11:49:49.228573
1000 | B     | 2013-06-03 11:49:49.23136
1000 | A     | 2013-06-03 11:49:49.233897
2000 | A     | 2013-06-03 11:49:49.243572
2000 | B     | 2013-06-03 11:49:49.245899

**Table detail2**
pid  | type |          datetime          
------+------+----------------------------
1000 | T1   | 2013-06-03 11:49:49.257978
1000 | T1   | 2013-06-03 11:49:49.258865
1000 | T1   | 2013-06-03 11:49:49.261212
1000 | T1   | 2013-06-03 11:49:49.263515
2000 | T1   | 2013-06-03 11:49:49.270654

**Table detail3**
pid  | quality |          datetime          
------+---------+----------------------------
1000 | Q1      | 2013-06-03 11:49:49.280894
1000 | Q1      | 2013-06-03 11:49:49.281786
1000 | Q1      | 2013-06-03 11:49:49.284011
2000 | Q1      | 2013-06-03 11:49:49.287797
2000 | Q1      | 2013-06-03 11:49:49.288629
2000 | Q1      | 2013-06-03 11:49:49.289587

I am looking for a query that would return data as follows:

CompanyID  CompanyName  detail1.StateA  detail1.stateB  count(detail2) count(detail3)
---------- ------------ --------------- --------------- -------------- ---------------
100        Company 1         2               3                4             3
101        Company 2         1               1                1             2 

I might further restrict the result based on datetime constraint.

Upvotes: 2

Views: 172

Answers (1)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 659357

SELECT c.cid
      ,c.cname
      ,sum(d1.d1_a_ct) AS d1_a_ct
      ,sum(d1.d1_b_ct) AS d1_b_ct
      ,sum(d2.d2_ct)   AS d2_ct
      ,sum(d3.d3_ct)   AS d3_ct
FROM   company c
LEFT   JOIN product p USING (cid)
LEFT   JOIN (
   SELECT pid, count(state = 'A' OR NULL) AS d1_a_ct
              ,count(state = 'B' OR NULL) AS d1_b_ct
   FROM   detail1
   -- WHERE datetime >= '2013-06-03 11:45:00'
   -- AND   datetime <  '2013-06-05 15:00:00'
   GROUP  BY pid
   ) d1   USING (pid)
LEFT   JOIN (
   SELECT pid, count(*) AS d2_ct
   FROM   detail2
   GROUP  BY pid
   ) d2   USING (pid)
LEFT   JOIN (
   SELECT pid, count(*) AS d3_ct
   FROM   detail3
   GROUP  BY pid
   ) d3   USING (pid);
GROUP BY  c.cid, c.cname;

It is important to avoid a "proxy cross join" in a situation like this. If you join to multiple n-tables (detail1, detail2, ...) and each can have multiple related rows, the rows would multiply each other.
To avoid the problem first aggregate the detail-tables so there is only 1 row per product. Then it's no problem to join all of them at once to the respective product.

More explanation in this related answer:
Two SQL LEFT JOINS produce incorrect result

I also use LEFT JOIN, even though you wrote that "each product has entries in ..". Can't hurt. Else, if there wouldn't be related rows in one of the detail-tables, you'd lose the whole company from the result.

I did the same for products, so you even get companies without any products at all.

Here is an explanation on how the partial counts with count(state = 'A' OR NULL) work:
Compute percents from SUM() in the same SELECT sql query

It is simple to further restrict on the datetime column. I added a commented WHERE clause. Note the use of >= and < to avoid a common mistake with timestamp ranges.

Upvotes: 2

Related Questions