Reputation: 2367
I have two tables users
and notes
with more than 50k records in users and 90k records in notes.
class User < ActiveRecord::Base
has_many :notes
end
Schema of users_table:
id email created_at
Schema of notes_table:
id user_id created_at category impact_score
I am trying to create a temporary table temp_user_notes
which can have data of both the tables on which i can perform ActiveRecord queries to fetch the data.
Schema of user_notes_table:
id user_id notes_id email user_created_at notes_created_at category impact_score
I am doing this
def self.populate_temp_user_notes
users = User.all
users.each do |user|
user.notes.each do |note|
user_notes = TempUserNote.new
user_notes.user_id = user.id
user_notes.notes_id = note.id
user_notes.auth_token_created_at = user.auth_token_created_at
user_notes.notes_recorded_at = note.recorded_at
user_notes.category = note.category
user_notes.well_being_score = note.perception_score
user_notes.save
end
end
end
Looping through all the uses and their notes is very long memory eating process what other approach can i use?
EDIT From here:-
My requirement is: I have series of queries seperated by AND and OR Conditions that take use of folowing table: users, notes, transactions, subscription. suppose my query to get target users is (Query1 OR Query2) AND Query3 AND Query4
then output of every query is input of next query.
eg:
total users in DB = 1000
1. user_list = (Query1 or Query2) #=> 500 users
2. taking 500 users from user_list as input for next query
3. user_list = user_list AND Query3 #=> 300 users
4. taking 300 users from point 3 as input for query in point 4
5. user_list = user_list AND Query4 #=> 50 users
in the last user list i have my target users and there notes.
Upvotes: 2
Views: 2335
Reputation: 106922
I would use a combination of find_each
and includes
to decrease both: memory usage and the number of database queries:
def self.populate_temp_user_notes
User.includes(:notes).find_each do |user|
user.notes.each do |note|
TempUserNote.create(
user_id: user.id,
notes_id: note.id,
auth_token_created_at: user.auth_token_created_at,
notes_recorded_at: note.recorded_at,
category: note.category,
well_being_score: note.perception_score,
)
end
end
end
Another very fast option might be to do this with in plain SQL like this:
INSERT INTO temp_user_notes
(user_id, notes_id, auth_token_created_at, notes_recorded_at, category, well_being_score)
SELECT users.id, notes.id, users.auth_token_created_at, notes.recorded_at, notes.category, notes.perception_score
FROM users INNER JOIN notes ON users.id = notes.user_id;
Upvotes: 4
Reputation: 15944
I am trying to create a temporary table temp_user_notes which can have data of both the tables on which i can perform ActiveRecord queries to fetch the data.
The more standard way of doing this is simply to join the two tables. If you use the following clause:
User.joins(:notes)
then this will give you a scope that can be used for further querying (you can e.g. add where
conditions on both tables) and the returned records will include columns from both the User
and Note
models, e.g.:
User.joins(:notes).where("notes.impact_score > 10")
Will give you all notes with a high-enough score, together with their attached users.
If you need to go through all the joined records, you should use the find_each
method that returns the records in batches.
Upvotes: 0
Reputation: 17802
With huge amounts of data, it would never be beneficial to use User.all
. Imagine extracting 50k rows of data, and then for each row, making a brand new object of User
class, and populating that object with the data in that row. Yes, it's gonna create trouble(s) for you, and Rails is aware of it.
You can use:
User.find_each do |user|
# do your thing
end
find_each
will give you 1000 records at one time, thus reducing memory operations.
You can overwrite the default value by providing batch_size
in find_each
:
User.find_each(batch_size: 3000) do |user|
# do your thing
end
Upvotes: 1