Reputation: 485
I am trying to parse some RSS feeds and write the URLs to a database using flask-sqlalchemy
.
Some of these feeds overlap (i.e. the same article appears in multiple feeds or multiple times in the same feed), so I've defined the primary key to be a (non-random) hash of the URL.
My problem is that when I am looping through the URLs and adding them to the Session, I hit an exception when I try to commit them to the database.
Here's my code:
for entry in feed.entries:
# Create the database row
article = Article(entry)
# If the row already exists in the database
if db.session.query(Article).filter_by(uuid=article.uuid).first():
print "duplicate"
else:
db.session.merge(article)
db.session.commit()
When the article already exists in the database, it is ignored. However, if it exists in the Session but hasn't been committed the database, then SQLAlchemy tries to write it multiple times in the same transaction, and I get sqlalchemy.exc.IntegrityError: (IntegrityError) column hash is not unique
.
My intuition is that I need to check that an object with that hash doesn't already exist in the session (as well as querying the database), and I thought that's what merge would do, but I still get the error.
Upvotes: 0
Views: 4135
Reputation: 127200
You should use the Unique Object pattern from the examples.
This creates an in memory cache for the session, and guarantees one unique result. If the instance is in the cache, use it, otherwise try to get it from the database. If it's not in the database, create it. Add the instance to the cache.
Here's my take on the pattern, which simplifies the linked example.
class UniqueMixin(object):
@classmethod
def get_unique(cls, **kwargs):
session = current_app.extensions['sqlalchemy'].db.session
session._unique_cache = cache = getattr(session, '_unique_cache', {})
key = (cls, tuple(kwargs.items()))
o = cache.get(key)
if o is None:
o = session.query(cls).filter_by(**kwargs).first()
if o is None:
o = cls(**kwargs)
session.add(o)
cache[key] = o
return o
class MyModel(UniqueMixin, db.Model):
# ...
name = db.Column(db.String, nullable=False)
# ...
m1 = MyModel.get_unique(name='test') # new instance
m2 = MyModel.get_unique(name='test') # from cache
assert m1 is m2
db.session.commit() # inserts one row
m1 = MyModel.get_unique(name='test') # from database
m2 = MyModel.get_unique(name='test') # from cache
assert m1 is m2
Inherit UniqueMixin in any model that might encounter this situation, and use get_unique instead of the normal constructor.
Upvotes: 6