ActiveRecord bulk data, memory grows forever

I am using ActiveRecord to bulk migrate some data from a table in one database to a different table in another database. About 4 million rows.

I am using find_each to fetch in batches. Then I do a little bit of logic to each record fetched, and write it to a different db. I have tried both directly writing one-by-one, and using the nice activerecord-import gem to batch write.

However, in either case, my ruby process memory usage is growing quite a bit throughout the life of the export/import. I would think that using find_each, I'm getting batches of 1000, there should only be 1000 of them in memory at a time... but no, each record I fetch seems to be consuming memory forever, until the process is over.

Any ideas? Is ActiveRecord caching something somewhere that I can turn off?

update 17 Jan 2012

I think I'm going to give up on this. I have tried: * Making sure everything is wrapped in a ActiveRecord::Base.uncached do * Adding ActiveRecord::IdentityMap.enabled = false (I think that should turn off the identity map for the current thread, although it's not clearly documented, and I think the identity map isn't on by default in current Rails anyhow)

Neither of those seem to have much effect, memory is still leaking.

I then added a periodic explicit:

GC.start

That seems to slow down the rate of memory leak, but the memory leak still happens (eventually exhausting all memory and bombing).

So I think I'm giving up, and deciding it is not currently possible to use AR to read millions of rows from one db and insert them into another. Perhaps there is a memory leak in MySQL-specific code being used (that's my db), or somewhere else in AR, or who knows.

Upvotes: 5

ActiveRecord bulk data, memory grows forever

Answers (2)

Related Questions