Reputation: 419
I have 3 tables:
users(id, account_balance)
grocery(user_id, date, amount_paid)
fishmarket(user_id, date, amount_paid)
Both fishmarket
and grocery
tables may have multiple occurrences for the same user_id with different dates and amounts paid or have nothing at all for any given user.
When I try the following query:
SELECT
t1."id" AS "User ID",
t1.account_balance AS "Account Balance",
count(t2.user_id) AS "# of grocery visits",
count(t3.user_id) AS "# of fishmarket visits"
FROM users t1
LEFT OUTER JOIN grocery t2 ON (t2.user_id=t1."id")
LEFT OUTER JOIN fishmarket t3 ON (t3.user_id=t1."id")
GROUP BY t1.account_balance,t1.id
ORDER BY t1.id
It produces an incorrect results: "1", "12", "12"
.
But when I try to LEFT JOIN
to just one table it produces a correct results for either grocery
or fishmarket
visits, which are "1", "3", "4"
.
What am I doing wrong here?
I am using PostgreSQL 9.1.
Upvotes: 40
Views: 34433
Reputation: 1
SELECT
t1."id" AS "User ID",
t1.account_balance AS "Account Balance",
Sum(Case When t2.user_id is null then 0 else 1 end) AS "# of grocery visits",
Sum(Case When t3.user_id is null then 0 else 1 end) AS "# of fishmarket visits"
FROM users t1
LEFT OUTER JOIN grocery t2 ON (t2.user_id=t1."id")
LEFT OUTER JOIN fishmarket t3 ON (t3.user_id=t1."id")
GROUP BY t1.account_balance,t1.id
ORDER BY t1.id
The above will also allow you to add additional criteria as needed
Upvotes: -2
Reputation: 656666
Joins are processed left to right (unless parentheses dictate otherwise). If you LEFT JOIN
(or just JOIN
, similar effect) three groceries to one user you get 3 rows (1 x 3). If you then join 4 fishmarkets for the same user, you get 12 (3 x 4) rows, multiplying the previous count in the result, not adding to it, like you may have hoped for.
Thereby multiplying the visits for groceries and fishmarkets alike.
You can make it work like this:
SELECT u.id
, u.account_balance
, g.grocery_visits
, f.fishmarket_visits
FROM users u
LEFT JOIN (
SELECT user_id, count(*) AS grocery_visits
FROM grocery
GROUP BY user_id
) g ON g.user_id = u.id
LEFT JOIN (
SELECT user_id, count(*) AS fishmarket_visits
FROM fishmarket
GROUP BY user_id
) f ON f.user_id = u.id
ORDER BY u.id;
To get aggregated values for one or few users, correlated subqueries like @Vince provided are just fine. For a whole table or major parts of it, it is (much) more efficient to aggregate the n-tables and join to the result once. This way, we also do not need another GROUP BY
in the outer query.
grocery_visits
and fishmarket_visits
are NULL
for users without any related entries in the respective tables. If you need 0
instead (or any arbitrary number), use COALESCE
in the outer SELECT
:
SELECT u.id
, u.account_balance
, COALESCE(g.grocery_visits , 0) AS grocery_visits
, COALESCE(f.fishmarket_visits, 0) AS fishmarket_visits
FROM ...
Upvotes: 73
Reputation: 364
For your original query, if you take away the group by to look at the pre-grouped result, you'll see why the counts your were receiving were created.
Perhaps the following query utilizing subqueries would achieve your intended result:
SELECT
t1."id" AS "User ID",
t1.account_balance AS "Account Balance",
(SELECT count(*) FROM grocery t2 ON (t2.user_id=t1."id")) AS "# of grocery visits",
(SELECT count(*) FROM fishmarket t3 ON (t3.user_id=t1."id")) AS "# of fishmarket visits"
FROM users t1
ORDER BY t1.id
Upvotes: 15
Reputation: 3400
It's because when the user table joins to the grocery table, there are 3 records matched. Then each of those three records matches with the 4 records in fishmarket, producing 12 records. You need subqueries to get what you are looking for.
Upvotes: 4