jpmc26
jpmc26

Reputation: 29934

Delete rows without a related record using SQLAlchemy

I have 2 tables; we'll call them table1 and table2. table2 has a foreign key to table1. I need to delete the rows in table1 that have zero child records in table2. The SQL to do this is pretty straightforward:

DELETE FROM table1
WHERE 0 = (SELECT COUNT(*) FROM table2 WHERE table2.table1_id = table1.table1_id);

However, I haven't been able to find a way to translate this query to SQLAlchemy. Trying the straightforward approach:

subquery = session.query(sqlfunc.count(Table2).label('t2_count')).select_from(Table2).filter(Table2.table1_id == Table1.table1_id).subquery()
session.query(Table1).filter(0 == subquery.columns.t2_count).delete()

Just yielded an error:

sqlalchemy.exc.ArgumentError: Only deletion via a single table query is currently supported

How can I perform this DELETE with SQLAlchemy?

Upvotes: 3

Views: 5549

Answers (2)

Ian Wilson
Ian Wilson

Reputation: 9079

I'm pretty sure this is what you want. You should try it out though. It uses EXISTS.

from sqlalchemy.sql import not_

# This fetches rows in python to determine which ones were removed.
Session.query(Table1).filter(not_(Table1.table2s.any())).delete(
    synchronize_session='fetch')

# If you will not be referencing more Table1 objects in this session then you
# can just ignore syncing the session.
Session.query(Table1).filter(not_(Table1.table2s.any())).delete(
    synchronize_session=False)

Explanation of the argument for delete():

http://docs.sqlalchemy.org/en/rel_0_8/orm/query.html#sqlalchemy.orm.query.Query.delete

Example with exists(using any() above uses EXISTS):

http://docs.sqlalchemy.org/en/rel_0_8/orm/tutorial.html#using-exists

Here is the SQL that should be generated:

DELETE FROM table1 WHERE NOT (EXISTS (SELECT 1 
FROM table2 
WHERE table1.id = table2.table1_id))

If you are using declarative I think there is a way to access Table2.table and then you could just use the sql layer of sqlalchemy to do exactly what you want. Although you run into the same issue of making your Session out of sync.

Upvotes: 3

jpmc26
jpmc26

Reputation: 29934

Well, I found one very ugly way to do it. You can do a select with a join to get the rows loaded into memory, then you can delete them individually:

subquery = session.query(Table2.table1_id
                        ,sqlalchemy.func.count(Table2.table2_id).label('t1count')
                        ) \
                  .select_from(Table2) \
                  .group_by(Table2.table1_id) \
                  .subquery()
rows = session.query(Table1) \
              .select_from(Table1) \
              .outerjoin(subquery, Table1.table1_id == subquery.c.table1_id) \
              .filter(subquery.c.t1count == None) \
              .all()
for r in rows:
    session.delete(r)

This is not only nasty to write, it's also pretty nasty performance-wise. For starters, you have to bring the table1 rows into memory. Second, if you were like me and had a line like this on Table2's class definition:

table1 = orm.relationship(Table1, backref=orm.backref('table2s'))

then SQLAlchemy will actually perform a query to pull the related table2 rows into memory, too (even though there aren't any). Even worse, because you have to loop over the list (I tried just passing in the list; didn't work), it does so one table1 row at a time. So if you're deleting 10 rows, it's 21 individual queries (1 for the initial select, 1 for each relationship pull, and 1 for each delete). Maybe there are ways to mitigate that; I would have to go through the documentation to see. All this for things I don't even want in my database, much less in memory.

I won't mark this as the answer. I want a cleaner, more efficient way of doing this, but this is all I have for now.

Upvotes: 1

Related Questions