ncocacola
ncocacola

Reputation: 485

Insert or update duplicate key in SQLAlchemy (IntegrityError)

I am trying to parse some RSS feeds and write the URLs to a database using flask-sqlalchemy.

Some of these feeds overlap (i.e. the same article appears in multiple feeds or multiple times in the same feed), so I've defined the primary key to be a (non-random) hash of the URL.

My problem is that when I am looping through the URLs and adding them to the Session, I hit an exception when I try to commit them to the database.

Here's my code:

for entry in feed.entries:
    # Create the database row
    article = Article(entry)

    # If the row already exists in the database
    if db.session.query(Article).filter_by(uuid=article.uuid).first():
        print "duplicate"
    else:
        db.session.merge(article)

db.session.commit()

When the article already exists in the database, it is ignored. However, if it exists in the Session but hasn't been committed the database, then SQLAlchemy tries to write it multiple times in the same transaction, and I get sqlalchemy.exc.IntegrityError: (IntegrityError) column hash is not unique.

My intuition is that I need to check that an object with that hash doesn't already exist in the session (as well as querying the database), and I thought that's what merge would do, but I still get the error.

Upvotes: 0

Views: 4135

Answers (1)

davidism
davidism

Reputation: 127200

You should use the Unique Object pattern from the examples.

This creates an in memory cache for the session, and guarantees one unique result. If the instance is in the cache, use it, otherwise try to get it from the database. If it's not in the database, create it. Add the instance to the cache.

Here's my take on the pattern, which simplifies the linked example.

class UniqueMixin(object):
    @classmethod
    def get_unique(cls, **kwargs):
        session = current_app.extensions['sqlalchemy'].db.session
        session._unique_cache = cache = getattr(session, '_unique_cache', {})
        key = (cls, tuple(kwargs.items()))
        o = cache.get(key)

        if o is None:
            o = session.query(cls).filter_by(**kwargs).first()

            if o is None:
                o = cls(**kwargs)
                session.add(o)

            cache[key] = o

        return o

class MyModel(UniqueMixin, db.Model):
    # ...
    name = db.Column(db.String, nullable=False)
    # ...

m1 = MyModel.get_unique(name='test')  # new instance
m2 = MyModel.get_unique(name='test')  # from cache
assert m1 is m2
db.session.commit()  # inserts one row

m1 = MyModel.get_unique(name='test')  # from database
m2 = MyModel.get_unique(name='test')  # from cache
assert m1 is m2

Inherit UniqueMixin in any model that might encounter this situation, and use get_unique instead of the normal constructor.

Upvotes: 6

Related Questions