user2229608
user2229608

Reputation: 45

Rails computation on large table

I'll try to explain my problem. I've created a table which represents something like text model for the user(user_models) based on purchases. I have also a table term_tfs which stores user_id and term(varchar(200)) as PK plus some other numeric columns. It is basically a matrix with terms and their numeric tf_idf_norm values for model. Now I need to make a computation comparing user models, so I need to load this matrix for one user and compare it with others for different users.

The problem is, that term_tfs table is really big (cca. 13.5 mill. rows) and I need to get matrices for users with at least 5 (1285 users) or 10 (9333) purchases. When I make one select from term_tfs table it takes 20-40 millis approximately. But I need some way how to do this for other 9-thousands users to compare with. Naive approach which makes query for each user_id into term_tfs takes something above 10 s for one comparison, which is slow if I want to do this comparison for next few thousand users and store it somewhere else.

def self.compare_user(user_id)
  @results = Hash.new
  # @user_ids = UserModel.where.not(user_id: user_id).pluck(:user_id)
  @user_ids = UserModel.get_useful_ids(user_id, 5)
  @user_matrix = TermTf.where(user_id: user_id).pluck(:term,  :tf_idf_norm)
  @user_terms = @user_matrix.map { |a| a[0] }


  @user_ids.each do |id|

    matrix = TermTf.where(user_id: id).pluck(:term, :tf_idf_norm)
    store_result( compare_matrix(matrix), id )

  end

  sort_results( @results )
end

def self.compare_matrix(matrix)
  sim = 0

  matrix.each do |t|
    unless ( ( i = @user_terms.index(t[0]) ).nil? )
      sim += t[1] * @user_matrix[i][1]
    end
  end

  sim
end

def self.store_result(similarity, id)
  @results[id] = similarity
end

Benchmark output (9333 user_ids):

puts Benchmark.measure {@user_ids.each{|id| TermTf.where(user_id:  id).pluck(:term, :tf_idf_norm)}}
4.890000   0.180000   5.070000 ( 11.019708)

This seems as rather bad/slow approach, so how to make it faster? I'm all pro to hear other ways how to solve this problem too, with Ruby or SQL.

Upvotes: 1

Views: 726

Answers (2)

slowjack2k
slowjack2k

Reputation: 2586

To put Beartech's approach into Rails code instead of creating a view, you can do some thing like this (needs to be adjusted to your needs):

subquery = TermTf.where(user_id: user_id).select(:term,  :tf_idf_norm).to_sql
result = TermTf.joins("INNER JOIN (#{subquery }) sub on sub.term = term_tfs.term")
         .select("term_tfs.user_id as user_id, sum(sub.tf_idf_norm * term_tfs.tf_idf_norm) as tf_idf_norm_sum")
         .where(user_id: @user_ids)
         .where.not(user_id:  user_id)
         .group('term_tfs.user_id')
         .all

Upvotes: 2

Beartech
Beartech

Reputation: 6411

My answer would be to not do it in Rails. You say at the end that you would like to know how to do it in Ruby, but I hope you will consider a non-ruby answer. If it is slow in Rails it's because Rails is not "good" at that particular process. I had several large tables that I had to display in my app. If you watch the Rails console and it shows lots of DB requests or slow performance you should move that process to the DB. The dB designers have been tuning the DB to handle those processes for years.

I would recreate the same logic in SQL and add it as a view in your DB. You can then add a simple model like:

In your models

term_tfs_view.rb

class TermTfsView  < ActiveRecord::Base
   #this is a model for a view in the DB
end

In your DB name your table term_tfs_views and it will automatically associate this model with the table.

My SQL skills are rudimentary, else I would try and give you an example of your logic translated from Ruby/Rails to SQL. If a SQL expert can weigh in and let us know if this is practical to do using SQL it would help.

Important

Views are great for data that you want to view. You can't do updates/inserts/etc. Using a view backed model. But that does not mean they are not great for moving heavy lifting from Rails to the DB. This link explains the idea of View backed Models very well: https://dan.chak.org/enterprise-rails/chapter-11-view-backed-models/

Upvotes: 1

Related Questions