Andrew
Andrew

Reputation: 12009

Joining multiple tables in PostgreSQL, and counting joined table rows

I have three tables. I'm trying to query my PRICE_LIST table, and count the number of SKUs that have a price in each price list, as well as the number of customers assigned to each price list.

Tables

My PRICE_LIST table:

price_list_id
price_number
name

This table has 41 rows.

My SKU_PRICE table:

sku_id
price_number

This table has 1,132 rows.

My CUSTOMER table:

customer_id
price_number
customer_type
is_active

This table has 6,535 rows, but I only want customers of type "E" that are active, so the number of rows I want drops to 2,961.

Query

My query:

SELECT
  pl.price_list_id,
  pl.price_number,
  pl.name,
  count(sp.sku_id)     "sku_count",
  count(c.customer_id) "customer_count"
FROM price_list pl
  LEFT JOIN sku_price sp ON (sp.price_number = pl.price_number)
  LEFT JOIN customer c ON (c.price_number = pl.price_number)
WHERE c.customer_type = 'E' AND c.is_active = 'T'
GROUP BY pl.price_list_id, pl.price_number, pl.name;

Problem

The problem I have is that the results are just crazy:

 price_list_id | price_number | name  | sku_count | customer_count
---------------+--------------+-------+-----------+----------------
            31 |            4 | SF0   |         0 |            792
            33 |            6 | SF2   |     30525 |          30525
             2 |            2 | ASNP2 |       972 |            972
             1 |            1 | ASNP1 |      1596 |           1596
            34 |            7 | SF3   |       616 |            616
            37 |           10 | SF6   |         0 |             51
            32 |            5 | SF1   |      1144 |           1144

The fact that I'm getting a count of 30,525 for both SKU count and customer count tells me that the join is doing weird stuff. I also can't see why I don't get price lists that have no products and no customers (and there are a bunch of those).

If I join PRICE_LIST to just one table — either SKU_PRICE or CUSTOMER — I get sensible results. It's only when I try and do both at once that it b0rks.

If someone can nudge me in the right direction I'd really appreciate it.

FWIW, I'm using PostgreSQL 9.3.5 on OS X.

Upvotes: 1

Views: 2673

Answers (1)

roman
roman

Reputation: 117345

If I understand your schema, you can do something like

select
    pl.price_list_id,
    pl.price_number,
    pl.name,
    count(distinct sp.sku_id) as "sku_count",
    count(distinct c.customer_id) as "customer_count"
from price_list pl
    left outer join sku_price as sp on sp.price_number = pl.price_number
    left outer join customer as c on c.price_number = pl.price_number
where c.customer_type = 'E' and c.is_active = 'T'
group by pl.price_list_id, pl.price_number, pl.name;

But performance wise I think it would be better to do it like this:

with cte_sku_price as (
     select count(*) as cnt, price_number from sku_price group by price_number
), cte_customer as (
     select count(*) as cnt, price_number
     from customer
     where customer_type = 'E' and is_active = 'T'
     group by price_number
)
select
    pl.price_list_id,
    pl.price_number,
    pl.name,
    sp.cnt as "sku_count",
    c.cnt as "customer_count"
from price_list pl
    left outer join cte_customer as c on c.price_number = pl.price_number
    left outer join cte_sku_price as sp on sp.price_number = pl.price_number

Upvotes: 6

Related Questions