larrydalmeida
larrydalmeida

Reputation: 1630

Most efficient query that matches multiple fields in different tables

I am working on the FreeCodeCamp Book Trading Club project. I have the following relations in my PostgreSQL database:

Users

CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  google_id NUMERIC NOT NULL
);

Sample data for users

==================
| id | google_id |
==================
| 6  | Tyrion    |
------------------
| 8  | Jon       |
==================

Books

CREATE TABLE books (
  id VARCHAR PRIMARY KEY,
  title TEXT NOT NULL
);

Sample data for books

=============================
| id          | title       |
=============================
| Kh5NawYsmBc | Banana Wars |
-----------------------------
| H0UULR931e4 | I, Robot    |
-----------------------------
| VIaOhHb/L98 | Sapiens     |
=============================

User Books Index Table

CREATE TABLE user_books (
  user_id INTEGER REFERENCES users(id),
  book_id VARCHAR REFERENCES books(id),
  status VARCHAR
);

Sample data for user_books

==================================
| user_id | book_id     | status |
==================================
| 8       | Kh5NawYsmBc | has    |
----------------------------------
| 6       | H0UULR931e4 | has    |
----------------------------------
| 6       | Kh5NawYsmBc | wants  |
----------------------------------
| 8       | H0UULR931e4 | wants  |
----------------------------------
| 6       | VIaOhHb/L98 | has    |
==================================

There are more fields, but they are not relevant to the problem, and I have shown only these for simplicity. This is what I am trying to do:

  1. When a user, say Tyrion, wants a book, the book will be added to the books table and also to the user_books table (book_id, user_id) and the status field in the user_books table will be set to 'wants'.

  2. Next, I need to check if any other user in the user_books table has the book by searching the user_books table for the book_id that Tyrion wants. Only the rows that have the status as 'has' should be selected.

  3. Then for each of these users that have the book Tyrion wants, I need to check if Tyrion has any books that they want.

There may be multiple users who have the books Tyrion wants, also want one of Tyrion's books. Tyrion may have many such books that other users want. But only 1 match is sufficient.

So if user Jon wants a book that Tyrion has, we have a match and this is the result that I want to be returned.

These are my concerns:

  1. Is it possible to do all of it in a single query?
  2. Is the structure of my database optimal for this type of querying at scale?
  3. What's the most efficient way to do this?

I am using a Node, Express server for the back end of this app.

I apologize if the terms I have used are not expressive or if the answer already exists. I have searched SO but couldn't find the right answer or the terms I am using are incorrect. I am a beginner at SQL databases.

Update

I have updated the table creation of Users to remove the UNIQUE contraint on the PRIMARY KEY because as many rightly pointed out, it's useless. Also corrected the data types.

This is the solution I came up with and works for me for 2 users with 2 books. But I have a suspicion that it might be terrible for more users:

SELECT 
  A.book_id AS book_id, 
  A.user_id AS user_one_id, 
  A.status AS user_one_status, 
  B.user_id AS user_two_id, 
  B.status AS user_two_status
FROM (


  --- BOOKS THAT USERS WITH REQUESTED BOOK WANT
  SELECT A1.book_id, A1.user_id, A1.status
  FROM user_books AS A1
  INNER JOIN (

    SELECT *
    FROM user_books 
    WHERE book_id = '${reqBookId}' AND status = 'has'

  ) AS A2
  ON A1.user_id = A2.user_id
  WHERE A1.status = 'wants'


) AS A
INNER JOIN (


  --- BOOKS THAT THE REQUESTING USER HAS
  SELECT *
  FROM user_books
  WHERE user_id = ${reqUserId} AND status = 'has'


) AS B
ON A.book_id = B.book_id

Upvotes: 0

Views: 303

Answers (4)

larrydalmeida
larrydalmeida

Reputation: 1630

This is the query I have used after taking tips from the answers posted by Laurenz Albe and jWolf:

SELECT
  u1.city_id,
  ub3.user_id AS user_two_id,
  ub3.book_id
FROM user_books ub1
  INNER JOIN users AS u1 ON ub1.user_id = u1.id
  INNER JOIN user_books AS ub2 ON ub1.user_id = ub2.user_id
  INNER JOIN user_books AS ub3 ON ub2.book_id = ub3.book_id
  INNER JOIN users AS u2 ON ub2.user_id = u2.id
WHERE
  ub1.book_id = '{requestedBookId}' AND
  ub1.user_id = {requestingUsersId} AND
  ub2.status = 'has' AND
  ub3.status = 'wants' AND
  u1.city_id = u2.city_id AND
  ub3.book_id NOT IN (SELECT user1_book_requested FROM trades) AND
  ub3.book_id NOT IN (SELECT user2_book_requested FROM trades)

The last two clauses in the WHERE only ensure that the book being selected for a trade match is not already part of a trade.

Thanks guys!

Upvotes: 0

jwolf
jwolf

Reputation: 938

Step 1 is it's own thing and not quite right (more on that later), but the rest can be done in a single query with a (pretty awesome) triple self join:

select ub2.user_id, ub2.book_id, u.google_id, b.title 
    from user_books ub1
    inner join user_books ub2 on ub2.user_id = ub1.user_id   
    inner join user_books ub3 on ub3.book_id = ub2.book_id 
    inner join books b on b.book_id = ub2.book_id
    inner join users u on u.user_id = ub2.user_id
    where 
        ub1.book_id = {the book Tyrion wants} and ub1.status = 'has' 
        and ub2.status = 'wants'
        and ub3.user_id = {Tyrion's id} and ub3.status = 'has'

In ub1, we get the list of all users who have the book Tyrion wants. In ub2, we get all the books that those users want. In ub3, we find the books that Tyrion has to trade and the intersection of those, if it exists, is the list of viable trades.

This method could also be expanded into larger multi-step multi-person trades by adding more self joins. The self joins are the heart of the query; the addition joins to User and Book only need to done once at the end to get the final name and title - we don't need those for the intermediate steps along the way.

So, Part 1 of the question has a small problem in that you can't just create a new book_id whenever a request is made or any given book will have a different id for every time someone requested it and no matches will ever be made. So you'll have to do a look-up to see it's already in the database (but the look-up will have to be pretty squishy to account for variations and misspellings if your looking it up by title - if you can count on a universal book id like a UPC or ISBN, great). If it is not found, then add the row to the book table. If the book is found, don't add it to the book table, then...

The exact same goes for users: do a look-up; if he's not in the users table add him.

Now you have either verified or added both the book_id and the user_id, you can now add the request to the user_book table. If either the book was new or the user was new, stop because either he's looking for a book nobody has or he has no books yet to trade and the most you can do is catalog that the book is being requested, which you've done. If neither the book or the user are new, run the query.

I hope this helps.

Upvotes: 1

Barani
Barani

Reputation: 58

Following query can return you what 'Jon' wants and with the priority what he has which others wants. I've provided with sample insert statements that I used for testing.

INSERT INTO users (google_id) VALUES('Tyrion')
INSERT INTO users (google_id) VALUES('Jon')
INSERT INTO users (google_id) VALUES('Robert')
INSERT INTO users (google_id) VALUES('Victor')

Inserting into Books table.

INSERT INTO Books values('Kh5NawYsmBc', 'Banana Wars')
INSERT INTO Books values('H0UULR931e4', 'I, Robot ')
INSERT INTO Books values('VIaOhHb/L98', 'Sapiens     ')
INSERT INTO Books values('RanDomNum1', 'Let us C')
INSERT INTO Books values('RanDomNum2', 'Teach yourself Java')

Inserting into user_Books table.

INSERT INTO user_books values(2,'Kh5NawYsmBc' , 'has')
INSERT INTO user_books values(1, 'H0UULR931e4' , 'has')
INSERT INTO user_books values(1, 'Kh5NawYsmBc' , 'wants')
INSERT INTO user_books values(1, 'H0UULR931e4' , 'wants')
INSERT INTO user_books values(2, 'VIaOhHb/L98' , 'has')
INSERT INTO user_books values(3, 'RanDomNum1' , 'has')
INSERT INTO user_books values(4, 'RanDomNum2' , 'has')
INSERT INTO user_books values(4, 'VIaOhHb/L98' , 'has')
INSERT INTO user_books values(2, 'H0UULR931e4' , 'wants')
INSERT INTO user_books values(4, 'H0UULR931e4' , 'has')

Query:

select sq2.google_id, sq5.title from 
    (select u1.*, ub1.*, b1.id [Bkid], b1.title from users u1 join user_books ub1 on u1.id = ub1.user_id
    join books b1 on ub1.book_id = b1.id where u1.google_id = 'Jon' and ub1.status = 'wants'
    ) sq1 
inner join 
    (select u1.*, ub1.*, b1.id [Bkid], b1.title from users u1 join user_books ub1 on u1.id = ub1.user_id
    join books b1 on ub1.book_id = b1.id where ub1.status = 'has' and u1.google_id <> 'Jon'
    ) sq2 on  sq1.Bkid = sq2.Bkid 
left join
    (select sq3.google_id [hasID], sq4.google_id [wantsID], sq3.title from 
        (select u1.*, ub1.*, b1.id [Bkid], b1.title from users u1 join user_books ub1 on u1.id = ub1.user_id
        join books b1 on ub1.book_id = b1.id where ub1.status = 'has' and u1.google_id = 'Jon'
        ) sq3 
        inner join 
        (select u1.*, ub1.*, b1.id [Bkid], b1.title from users u1 join user_books ub1 on u1.id = ub1.user_id
        join books b1 on ub1.book_id = b1.id where ub1.status = 'wants' and u1.google_id <> 'Jon'
        ) sq4
        on sq3.book_id = sq4.book_id
    ) as sq5
on sq2.google_id = sq5.wantsID
order by 2 desc

Here is the result:

google_id                                          title
---------------------------------------------- -----------------------------
Tyrion                                             Banana Wars
Victor                                             NULL

Upvotes: 0

Laurenz Albe
Laurenz Albe

Reputation: 248305

To find all users and books that might be exchanged for 'The interesting book' that 'Tyrion' wants, you could run something like this:

SELECT u2.google_id, b1.title
FROM users u1
   JOIN user_books ub1 ON u1.id = ub1.user_id
   JOIN books b1 ON ub1.book_id = b1.id
   JOIN user_books ub2 ON b1.id = ub2.book_id
   JOIN users u2 ON u2.id = ub2.book_id
   JOIN user_books ub3 ON ub3.user_id = u2.id
   JOIN books b2 ON b2.id = ub3.book_id
WHERE u1.google_id = 'Tyrion'
  AND ub1.status = 'has'
  AND ub2.status = 'wants'
  AND ub3.status = 'has'
  AND b2.title = 'The interesting book';

The query should be as efficient as it gets if you have proper indexes on all columns involved in nested loop joins and all columns in WHERE clauses except user_books.status.

I think that your table structure makes sense, except for the redundant UNIQUE constraints and the fact that not all artificial primary keys are numbers. user_books should have a primary key on (user_id, book_id).

Upvotes: 0

Related Questions