Reputation: 2677
I'm using the following method to perform a bulk insert, and to optionally avoid inserting duplicates, with SQLAlchemy:
def bulk_insert_users(self, users, allow_duplicates = False):
if not allow_duplicates:
users_new = []
for user in users:
if not self.SQL_IO.db.query(User_DB.id).filter_by(user_id = user.user_id).scalar():
users_new.append(user)
users = users_new
self.SQL_IO.db.bulk_save_objects(users)
self.SQL_IO.db.commit()
Can the above functionality be implemented such that the function is faster?
Upvotes: 1
Views: 1552
Reputation: 40901
How many users do you have? You're querying for the users one at a time, every single iteration of that loop. You might have more luck querying for ALL user Ids, put them in a list, then check against that list.
existing_users = #query for all user IDs
for user in new_users:
if user not in existing_users:
#do_stuff
Upvotes: 1
Reputation: 318508
You can load all user ids first, put them into a set and then use user.user_id in existing_user_ids
to determine whether to add a new user or not instead of sending a SELECT query every time. Even with ten thousands of users this will be quite fast, especially compared to contacting the database for each user.
Upvotes: 2