Larry Lustig
Larry Lustig

Reputation: 51000

Efficiently delete nested records using SQLAlchemy

I'm trying to delete a large set of nested records from SQLAlchemy. To do so, I need to query the data and then use the session to delete it. My problem is that I'm running out of memory on the query.

Specifically, I have classes CA, CB, CC, and CD. CA uses name as the primary key and CB has an autoincrement CBId column as its primary key. An instance of CA contains about 1000 instances of CB which, in turn contains 1000 instances of CC and about eight million instances of CD.

What I want to emit to the database is, essentially:

 DELETE FROM CCTable WHERE CBId IN (SELECT CBId FROM CBTable WHERE Name = ?)
 DELETE FROM CDTable WHERE CBId IN (SELECT CBId FROM CBTable WHERE Name = ?)
 DELETE FROM CBTable WHERE Name = ?
 DELETE FROM CATable WHERE Name = ?

What I'm currently trying is:

    data = self._context.query(CA) \
            .filter_by(name = name_to_delete) \
            .options(
                joinedload('CB').options(joinedload('CC'), joinedload('CD')) 
            ).first() 
    if data:
        context.delete(data)
        self._context.commit()

This fails, not unexpectedly, with OUT OF MEMORY.

How can I accomplish my goal in SQLAlchemy?

Upvotes: 0

Views: 336

Answers (1)

van
van

Reputation: 77012

The ON DELETE CASCADE option mentioned in the comments could be a solution on the database level.

Otherwise, your query basically loads the whole network of the objects starting from CA into memory, which is the most likely cause of your issue. Furthermore, even if it was not, by doing context.delete(data) you are only deleting the instance of CA as data will be a CA instance, and unless your relationships define correctly cascade, the related rows will not be deleted in the database.

And if you have configured those properly, the issue of a need to load everything into memory in order to delete it completely is not the approach one should take.

Solution: delete directly on the database (without data load)

With the example model defined below:

class CA(Base):
    __tablename__ = "ca_table"
    name = Column(String, primary_key=True)

    cbs = relationship("CB", back_populates="ca")


class CB(Base):
    __tablename__ = "cb_table"
    id = Column(Integer, primary_key=True)
    name = Column(String, index=True)
    name = Column(ForeignKey(CA.name))

    ca = relationship(CA, back_populates="cbs")
    ccs = relationship("CC", back_populates="cb")
    cds = relationship("CD", back_populates="cb")


class CC(Base):
    __tablename__ = "cc_table"
    id = Column(Integer, primary_key=True)
    name = Column(String, index=True)
    cb_id = Column(ForeignKey(CB.id))

    cb = relationship(CB, back_populates="ccs")


class CD(Base):
    __tablename__ = "cd_table"
    id = Column(Integer, primary_key=True)
    name = Column(String, index=True)
    cb_id = Column(ForeignKey(CB.id))

    cb = relationship(CB, back_populates="cds")

I would implement the deletion as per below:

def delete_a(session, name_to_delete):
    q = session.query(CD).filter(CD.cb.has(CB.name == name_to_delete))
    q.delete(synchronize_session=False)

    q = session.query(CC).filter(CC.cb.has(CB.name == name_to_delete))
    q.delete(synchronize_session=False)

    q = session.query(CB).filter(CB.name == name_to_delete)
    q.delete(synchronize_session=False)

    q = session.query(CA).filter(CA.name == name_to_delete)
    q.delete(synchronize_session=False)

    session.commit()


delete_a(session, "some_a_name")

Upvotes: 1

Related Questions