Shauno
Shauno

Reputation: 1258

Most efficient way to find overlap between two tables

Given two tables with the column "title" that is not sorted or unique:

Book
|id|title |
|1 |book_1|
|2 |book_2|
|3 |book_3|
|4 |book_4|
|5 |book_5|
|6 |book_5|
|7 |book_5|
|8 |book_6|
|9 |book_7|

UserBook
|user_id|book_id|state        |title  |
|1      |2      |"in progress"|book_2 |
|1      |4      |"completed"  |book_4 |
|1      |6      |"completed"  |book_5 |
|2      |3      |"completed"  |book_3 |
|2      |6      |"completed"  |book_5 |
|3      |1      |"completed"  |book_1 |
|3      |2      |"completed"  |book_2 |
|3      |4      |"completed"  |book_4 |
|3      |7      |"in progress"|book_5 |
|3      |8      |"completed"  |book_6 |
|3      |9      |"completed"  |book_7 |

I'd like to create a binary matrix of users and book titles with state "completed".

[0, 0, 0, 1, 1, 0, 0]
[0, 0, 1, 0, 1, 0, 0]
[1, 1, 0, 1, 0, 1, 1]

This gets the results I'd like, but has very high algorithmic complexity. I am hoping to get the results with SQL.

How much more simple could it be if state was boolean and titles were unique?

matrix = []
User.all.each do |user|
  books = Book.distinct.sort(title: :asc).pluck(:title).uniq
  user_books = UserBook.where(user: user, state: "completed").order(title: :asc).pluck(:title)
  matrix << books.map{|v| user_books.include?(v) ? 1 : 0}
end

Upvotes: 1

Views: 977

Answers (4)

Cary Swoveland
Cary Swoveland

Reputation: 110685

Should you consider producing the desired array using Ruby, rather than SQL, first read data from the table Book into an array book:

book = [
  [1, "book_1"], [2, "book_2"], [3, "book_3"], [4, "book_4"],
  [5, "book_5"], [6, "book_5"], [7, "book_5"], [8, "book_6"],
  [9, "book_7"]
] 

and data from the table UserBook into an array user_book:

user_book = [
  [1, 2, :in_progress], [1, 4, :completed], [1, 6, :completed],
  [2, 3, :completed],   [2, 6, :completed],
  [3, 1, :completed],   [3, 2, :completed], [3, 4, :completed], [3, 7, :in_progress],
  [3, 8, :completed],   [3, 9, :completed]
] 

Note the first element of each element of book, an integer, is the book_id, and the first two elements of each element of user_book, integers, are respectively the user_id and book_id.

You could then construct the desired array as follows:

h = book.map { |book_id,title| [book_id, title[/\d+\z/].to_i-1] }.to_h
  #=> {1=>0, 2=>1, 3=>2, 4=>3, 5=>4, 6=>4, 7=>4, 8=>5, 9=>6} 

cols = h.values.max + 1
  #=> 6
arr = Array.new(3) { Array.new(cols, 0) }
  #=> [[0, 0, 0, 0, 0, 0],
  #    [0, 0, 0, 0, 0, 0],
  #    [0, 0, 0, 0, 0, 0]] 

user_book.each do |user_id, book_id, status|
  arr[user_id-1][h[book_id]] = 1 if status == :completed
end

arr
  #=> [[0, 0, 0, 1, 1, 0, 0],
  #    [0, 0, 1, 0, 1, 0, 0],
  #    [1, 1, 0, 1, 0, 1, 1]] 

Upvotes: 1

Sajad Rastegar
Sajad Rastegar

Reputation: 3154

You could group UserBook by user_id and use aggregate functions to select the list of books on each group. The entire code snippets is as follows:

books = Book.order(title: :asc).pluck(:title).uniq
matrix = []
UserBook.where(state: "completed")
        .select("string_agg(title, ',') as grouped_name")
        .group(:user_id)
        .each do |group|
  user_books = group.grouped_name.split(',')
  matrix << books.map { |title| user_books.include?(title) ? 1 : 0 }
end

In MySQL you need to replace string_agg(title, ',') with GROUP_CONCAT(title)

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269953

SQL is not very good at matrices. But you can store the values as (x,y) pairs. You want to include 0 values as well as 1, so the idea is to generate the rows using a cross join and then bring in the existing data:

select b.book_id, u.user_id,
       (case when ub.id is not null then 1 else 0 end) as is_completed
from books b cross join
     users u left join
     user_books ub
     on ub.user_id = u.id and
        ub.book_id = b.id and
        ub.state = 'completed';

Upvotes: 2

Nick Ellis
Nick Ellis

Reputation: 1077

in straight SQL

select * from books join user_books on (books.id = user_books.id) 
where user_books.state = 'completed';

In Ruby ActiveRecord

Book.joins(:user_books).where(:state => 'completed')

Upvotes: 0

Related Questions