Boa
Boa

Reputation: 2677

Faster way to perform bulk insert, while avoiding duplicates, with SQLAlchemy

I'm using the following method to perform a bulk insert, and to optionally avoid inserting duplicates, with SQLAlchemy:

def bulk_insert_users(self, users, allow_duplicates = False):
    if not allow_duplicates:
        users_new = []

        for user in users:
            if not self.SQL_IO.db.query(User_DB.id).filter_by(user_id = user.user_id).scalar():
                users_new.append(user)

        users = users_new

    self.SQL_IO.db.bulk_save_objects(users)
    self.SQL_IO.db.commit()

Can the above functionality be implemented such that the function is faster?

Upvotes: 1

Views: 1552

Answers (2)

sytech
sytech

Reputation: 40901

How many users do you have? You're querying for the users one at a time, every single iteration of that loop. You might have more luck querying for ALL user Ids, put them in a list, then check against that list.

existing_users = #query for all user IDs
for user in new_users:
    if user not in existing_users:
        #do_stuff

Upvotes: 1

ThiefMaster
ThiefMaster

Reputation: 318508

You can load all user ids first, put them into a set and then use user.user_id in existing_user_ids to determine whether to add a new user or not instead of sending a SELECT query every time. Even with ten thousands of users this will be quite fast, especially compared to contacting the database for each user.

Upvotes: 2

Related Questions