dave
dave

Reputation: 7857

Postgres - Multiple joins is causing my query to return incorrect data

I have three tables in my database. Their schema is basically:

books:
id | book_title

books_tags:
id | book_id | tag_id

books_votes:
id | book_id | vote

The idea is to be able to search for books and filter by given tags (716 and 101, in this case). The total_votes is used in the ORDER BY caluse.

SELECT
    books.id AS books_id, sum(book_votes.vote) AS total_votes 
FROM
    books
JOIN
    -- works fine without this join
    book_votes ON books.id = book_votes.book_id
JOIN
    books_tags ON books.id = books_tags.book_id
WHERE
    books_tags.tag_id IN (716, 101)
GROUP BY
    books.id 
HAVING
    count(books.id) = 2

The tag filtering, by itself, works great. I can add as many tag ids to the IN clause as I wish and it will continue to filter the results to only show books with those tags. Perfect.

The problem occurs when I add in the second JOIN to the books_votes table. This join doesn't produce any errors it just causes the query to return the wrong data - Like it's ignoring the tag ids.

What's wrong with having the second join?

EDIT:

Here's the dumps from the tables:

books:
 id |   book_title  
----+-----------------
  1 | first
  2 | second
  3 | third book
  4 | fourth book
  5 | fifth
  6 | sixth book

books_tags:
 id | book_id | tag_id 
----+---------+--------
  1 |       1 |    293
  2 |       1 |     32
  3 |       1 |    370
  4 |       2 |    101
  5 |       2 |    357
  6 |       3 |    554
  7 |       3 |    808
  8 |       3 |    716
  9 |       3 |    101
 10 |       4 |    787
 11 |       4 |    808
 12 |       4 |    322
 13 |       5 |    787
 17 |       6 |    716
 18 |       6 |    554
 19 |       6 |    101

books_votes:
 id | book_id | vote 
----+---------+------
  2 |       2 |    1
  3 |       3 |    1
  4 |       4 |    1
  7 |       4 |    1
  8 |       2 |    1
 11 |       5 |    1
 12 |       5 |    1
 13 |       1 |    1

Here's the data returned, from the query I posted, when I leave out the second join (to books_votes):

 book_id 
---------
   6
   3

As you can see, the correct books are returned. Books 6 and 3 have been tagged with the ids 716 and 101.

Here's what's returned when I run the query with the books_votes table joined:

 book_id | total_votes 
---------+-------------
    3    |    2
    2    |    3

Upvotes: 2

Views: 2707

Answers (2)

Build complex SQL step by step.

This gives you the books that have both the required tags. It's only as reliable as your table definition. Your table definition shouldn't allow one book to have the same tag twice. You need a UNIQUE constraint on (book_id, tag_id).

SELECT book_id 
FROM books_tags
WHERE books_tags.tag_id IN (716, 101)
GROUP BY book_id
HAVING COUNT(tag_id) = 2

book_id
--
6
3

You can use that in a JOIN.

SELECT books.id
FROM books
INNER JOIN (
    SELECT book_id 
    FROM books_tags
    WHERE books_tags.tag_id IN (716, 101)
    GROUP BY book_id
    HAVING COUNT(tag_id) = 2) bt ON bt.book_id = books.id

book_id
--
6
3

Joining on the table of votes should drop book_id 6 from the result. (No votes for 6.)

SELECT books.id
FROM books
INNER JOIN (
    SELECT book_id 
    FROM books_tags
    WHERE books_tags.tag_id IN (716, 101)
    GROUP BY book_id
    HAVING COUNT(tag_id) = 2) bt ON bt.book_id = books.id
INNER JOIN books_votes bv ON bv.book_id = books.id

book_id
--
3

Now you can add the vote column to the query.

SELECT books.id, bv.vote
FROM books
INNER JOIN (
    SELECT book_id 
        FROM books_tags
    WHERE books_tags.tag_id IN (716, 101)
    GROUP BY book_id
    HAVING COUNT(tag_id) = 2) bt ON bt.book_id = books.id
INNER JOIN books_votes bv ON bv.book_id = books.id

book_id  vote
--
3        1

Finally, you can sum the votes.

SELECT books.id, SUM(bv.vote) AS total_votes
FROM books
INNER JOIN (
    SELECT book_id 
        FROM books_tags
    WHERE books_tags.tag_id IN (716, 101)
    GROUP BY book_id
    HAVING COUNT(tag_id) = 2) bt ON bt.book_id = books.id
INNER JOIN books_votes bv ON bv.book_id = books.id
GROUP BY books.id;

book_id  total_votes
--
3        1

Your version doesn't work, because it returns the wrong book id numbers. The combination of the JOIN on books_votes and the WHERE clause doesn't do what you expected it to do.

SELECT books.id AS books_id
FROM books
JOIN books_votes ON books.id = books_votes.book_id
JOIN books_tags ON books.id = books_tags.book_id
WHERE books_tags.tag_id IN (716, 101)
GROUP BY books.id 

books_id
--
3
2

Book 2 is included not because it has both tags, but because it has two votes.

SELECT books.id AS books_id, books_tags.tag_id, books_votes.vote
FROM books
JOIN books_votes ON books.id = books_votes.book_id
JOIN books_tags ON books.id = books_tags.book_id
WHERE books_tags.tag_id IN (716, 101)
ORDER BY books_id, tag_id

book_id  tag_id     vote
--
2        101        1
2        101        1
3        101        1
3        716        1

Upvotes: 4

Alex Aza
Alex Aza

Reputation: 78447

As I understand you need all the books that have tags 716 and 101, and you need the vote count per each book.

select *,
    (select count(*) from book_votes as vts where vts.book_id = bks.id) as vote_count
from books as bks
where 
    id in 
    (
        select book_id
        from books_tags as tgs
        where tgs.tag_id in (716, 101)
        group by book_id
        having count(*) = 2
    )

Result:

id          book_title      vote_count
----------- --------------- -----------
3           third book      1
6           sixth book      0

Upvotes: 0

Related Questions