Reputation: 3
Disclaimer: I'm a database noob and this is my first question on StackOverflow. Happy to edit to make it as clear and up to code as possible.
In my application I have two tables: InputData (external data downloaded by my app) and OutputData (data created by my app). One or more InputData can be used to create one or more OutputData and when an InputData item is deleted, I want to delete all OutputData that was created using the deleted item as an input. Essentially a many-to-many association with cascade deletions.
Luckily for me, or so I thought, there is an exact example of what I want to do in the SQLAlchemy docs. However, after implementing this and working with it a bit, I've found it to be remarkably slow.
I did some benchmarking and found that 100,000 InputData, each with a single OutputData child (for a total of 100,000 OutputData) took almost 10 minutes to delete. In contrast, a one-to-many model with cascade deletes I have takes only 30 seconds to delete tables orders of magnitudes larger.
After a good deal of research, I have really only two ideas:
db.session.query(InputData).delete()
db.session.commit()
My tables (exactly what is from the docs with different names):
association_table = Table('association', Base.metadata,
Column('input_data_id', Integer, ForeignKey('input_data.id', ondelete="CASCADE")),
Column('output_data_id', Integer, ForeignKey('output_data.id', ondelete="CASCADE"))
)
class InputData(Base):
__tablename__ = 'input_data'
id = Column(Integer, primary_key=True)
children = relationship(
"OutputData",
secondary=association_table,
back_populates="inputs",
cascade="all, delete",
)
class OutputData(Base):
__tablename__ = 'output_data'
id = Column(Integer, primary_key=True)
parents = relationship(
"InputData",
secondary=association_table,
back_populates="outputs",
passive_deletes=True
)
Thank you in advance!
Upvotes: 0
Views: 304
Reputation: 24613
well, delete cascade by nature is per record, which makes delete operation super slow (or update)
whereas you can have separate delete statements to delete from tables per set which is way faster than cascade delete.
so while cascade delete is more convenient for deleting one record for not a lot of records at the same time, manual deletion will be better approach, if you are deleting a set of records.
either way , adding index on foreign key will increase the performance of deletion.
Upvotes: 1