John Hinnegan
John Hinnegan

Reputation: 5962

Finding records not in a join -- Efficiently

I've seen this question a few times, but all the answers rely on having small data sets or limited associations, so here goes again.

I have a has_many_through relationship

class ModelA
  has_many :model_c, :through => :model_b
  has_many :model_b

class ModelB
  belongs_to :model_a
  belongs_to :model_b

class ModelC
  has_many :model_a, :through => :model_b
  has_many :model_b

I need to clean up 'orphaned' Model B's and Model C's -- in which the Model A has been deleted. So I need to find and delete Model B's without a Model A, and then Model C's without a model B.

We have not been doing this when deleting Model A's because one Model A can be associated to tens of thousands of Model C's and it was taking far too long -- our use of the dependent destroy with a guard block fired off way too many queries and took way too long. We're planning to instead do a nightly cleanup job now.

So, I need to find and delete all ModelB's whose model_a columns contains an ID for a now-non-existant Model A, and delete them.

Solutions I've found so far that won't work for me (because we might have 100's of thousands of orphans at a given time): - loading all the ids from ModelA.id and ModelB.model_a and doing a set difference - loading all the ids from ModelA.id and passing to a "NOT IN" query on Model B - checking from model C record-by-record

At this point, I think I need a straight SQL solution (which I'm fine with), but I'm not quite sure what I'm looking for.

(DB is MySQL)

Upvotes: 1

Views: 65

Answers (1)

Conrad Frix
Conrad Frix

Reputation: 52645

I haven't tested it but I believe this will work if understood MySQL Multiple-table Delete Syntax

DELETE 
     ModelC, 
      ModelB 
FROM ModelC 
     LEFT JOIN ModelB 
     ON ModelC.BID = ModelB.BID
     LEFT JOIN ModelA
     ON ModelB.AID = ModelA.AID
WHERE
     ModelB.BID is NULL
     OR ModelA.AID IS NULL

Upvotes: 3

Related Questions