Insert large amount of data into MongoDB collection with Doctrine

Question

I am using Doctrine and MongoDB for an application and there is one task that should import data from an CSV file into a collection. There are about 5 different CSV files with at least 450.000 entries per file that should be important 1 to 3 times per year.

Currently I iterate through all lines of a file, create an object, call persist() and flush every 2.500 items.

Each item is not very large, it has an ID, a string that is 10-20 characters, a string that is 5-10 characters and a boolean value.

My first question is: When I flush every 5.000 items inserting gets significantly slower. In my test environment flushing 5.000 items took about 102 seconds, flushing 2.500 items took about 10 seconds.

Flushing gets slower after a while. As mentioned, at the beginning flush 2.500 items took about 10 seconds, after 100.000 items, flushing 2.500 items takes nearly one minute.

Are there any optimisations I can do to speed things up?

Boris Gu&#233;ry · Accepted Answer

I think there are two parts that may be optimized in your script.

The way you read your CSV files, depending on how you load it, you either totally load it in memory (using file_get_contents() or file() for example), or reading it chunck by chunck with fopen(), fread().

The last option is prefered because it will only take the necessary amount of memory when processing a bunch of lines.

You need to clear() the object you already processed, else it will be kept in memory until the end of your script. This means that if one DC2 object uses 1Mo of memory, and you have 100,000 objects, at the end of your scripts it will use 100,000Mo. So batching your insert by range of 2,500 is quite a good idea but you obviously need to remove the processed object from the EntityManager.

It can be done using $entityManager->clear();

The clear() will clear the whole EntityManager, you want to clear a single entity, you may use $entityManager->detach($object).

If you want to profile your memory usage you may also be interested in the functions

memory_get_usage()

memory_get_peak_usage()

Insert large amount of data into MongoDB collection with Doctrine

Answers (2)

Related Questions