Niel
Niel

Reputation: 2006

SQLAlchemy find all duplicates in children

I've been looking for solutions to this but I couldn't find anything on finding duplicates with SQLAlchemy.

I have a parent-child type relationship, and I'm looking to find all the duplicates in the children on a specific column.

I tried iterating over each parent and counting on the column, but it gave me results that didn't make sense.

parents = session.query(parent).all()
for parent in parents:
    dups = session.query(child).filter_by(parentid=parent.id).group_by(child.foo_column).count()
    if dups > 0:
        # do action on duplicates

How can I get the duplicate children, or is there even a single query that could return all the duplicates?

EDIT: Table definitions:

class parent(Base):
     __tablename__ = 'parent'
    id = Column(Integer, primary_key=True)

class child(Base):
    __tablename__ = 'child'
    id = Column(Integer, primary_key=True)
    parentid = Column(Integer, ForeignKey('parent.id'))
    foo_column = Column(Integer, ForeignKey('foo.id'))

    parent = relationship('parent',
                            backref=backref('children'))

    foo = relationship('foo')

The foo_column I'm interested in contains just integer id's, so a duplicate would just be where foo1.id == foo2.id.

Upvotes: 1

Views: 4054

Answers (1)

aa333
aa333

Reputation: 2574

What you are trying to achieve requires a self join. Think of how you would do it in SQL. Your query would look something like:

SELECT child_1.id as dup1, child_2.id as dup2
FROM child as child_1 JOIN child as child_2 
     ON child_1.parentid = child_2.parentid
WHERE child_1.foo_column = child_2.foo_column;

Translating that to SQL Alchemy is straightforward:

child_2 = aliased(child)
dups = session.query(child).
               join(child_2, child.parentid == child_2.parentid).
               filter(child.foo_column == child_2.foo_column).
               with_entities(child.id.label('dup1'), child_2.id.label('dup2'))

Upvotes: 4

Related Questions