Reputation: 2006
I've been looking for solutions to this but I couldn't find anything on finding duplicates with SQLAlchemy.
I have a parent-child type relationship, and I'm looking to find all the duplicates in the children on a specific column.
I tried iterating over each parent and counting on the column, but it gave me results that didn't make sense.
parents = session.query(parent).all()
for parent in parents:
dups = session.query(child).filter_by(parentid=parent.id).group_by(child.foo_column).count()
if dups > 0:
# do action on duplicates
How can I get the duplicate children, or is there even a single query that could return all the duplicates?
EDIT: Table definitions:
class parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True)
class child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True)
parentid = Column(Integer, ForeignKey('parent.id'))
foo_column = Column(Integer, ForeignKey('foo.id'))
parent = relationship('parent',
backref=backref('children'))
foo = relationship('foo')
The foo_column
I'm interested in contains just integer id's, so a duplicate would just be where foo1.id == foo2.id
.
Upvotes: 1
Views: 4054
Reputation: 2574
What you are trying to achieve requires a self join. Think of how you would do it in SQL. Your query would look something like:
SELECT child_1.id as dup1, child_2.id as dup2
FROM child as child_1 JOIN child as child_2
ON child_1.parentid = child_2.parentid
WHERE child_1.foo_column = child_2.foo_column;
Translating that to SQL Alchemy is straightforward:
child_2 = aliased(child)
dups = session.query(child).
join(child_2, child.parentid == child_2.parentid).
filter(child.foo_column == child_2.foo_column).
with_entities(child.id.label('dup1'), child_2.id.label('dup2'))
Upvotes: 4