Reputation: 971
In my symfony2 command, I am running a script that inserts hundreds of thousands of urls (as string) into a document.
Here are the basic structures of the 2 documents I'm using. Before the program is run, there are thousands of ParentDocuments already inside the mongodb, but zero ChildDocuments:
ParentDocument:
$id:id
$subDocument:OneToManyReference(ChildDocument)
$etc:everythingelse
ChildDocument:
$id:id
$url:string
$parentDocument:ManyToOneReference(ParentDocument)
And my Command code:
$dm = $this->getContainer()->get('doctrine_mongodb.odm.document_manager');
$parentDocuments = $dm->repository('My:Bundle:ParentDocument')->findAll();
while ($parentDocument = $parentDocuments->getNext()) {
//Returns an array of hundreds of thousands urls
$urls = $this->somehowFetchUrlsRelatedToTheParentDocument($parentDocument);
foreach ($urls as $url) {
$subDocument = new SubDocument();
$subDocument->setUrl($url);
$subDocument->setParentDocument($parentDocument);
$dm->persist($subDocument);
}
$dm->flush();
}
When I run this simple command, the write speed at first is incredibly fast. However, in the case of inserting millions of rows, the write speeds become significantly slower. As slow as 1 write per second after the command has been running for 10 minutes, making the code extremely ineffective.
My first attempt at fixing this problem was to clear the document manager right after it flushes using $dm->clear();
But this meant that the document manager would lose track of the current ParentDocument. So my solution was this:
$dm = $this->getContainer()->get('doctrine_mongodb.odm.document_manager');
$parentDocumentCursors = $dm->repository('My:Bundle:ParentDocument')->findAll();
$parentDocuments = array();
while ($parentDocument = $parentDocumentCursors->getNext()) {
array_push($parentDocuments, $parentDocument);
}
$dm->clear();
unset($dm);
$dm = $this->getContainer()->get('doctrine_mongodb.odm.document_manager');
foreach ($parentDocuments as $parentDocument) {
$urls = $this->somehowFetchUrlsRelatedToTheParentDocument($parentDocument);
foreach ($urls as $url) {
$subDocument = new SubDocument();
$subDocument->setUrl($url);
$subDocument->setParentDocument($parentDocument);
$dm->persist($subDocument);
}
$dm->flush();
$dm->clear();
}
This solved the problem. The write speeds were consistently fast throughout the whole execution of the program and millions of rows were able to be inserted without gradual delay.
However, this feels like a bad practice and a quick fix hack. What is the best practice for inserting millions of rows in Symfony2 using document manager without read/write speeds becoming slow?
Upvotes: 0
Views: 4263
Reputation: 306
In order to do a bulk insert in doctrine you would need to move your flush outside of your loop. Consider the scenario below where you would persist in the foreach then flush when the foreach is completed. Your only catch will be that you will not be able to query any of the data being inserted in the batch until after the flush.
$dm = $this->getContainer()->get('doctrine_mongodb.odm.document_manager');
foreach ($parentDocuments as $parentDocument) {
$urls = $this->somehowFetchUrlsRelatedToTheParentDocument($parentDocument);
foreach ($urls as $url) {
$subDocument = new SubDocument();
$subDocument->setUrl($url);
$subDocument->setParentDocument($parentDocument);
$dm->persist($subDocument);
}
}
$dm->flush();
$dm->clear();
Another option is to do a a push,pushall, or addto set. One issue to consider is you will need to use stdClass in php in order to add an object. I find this to be the quickest way to update a subdocument. For example:
$dm->createQueryBuilder('My:Bundle:ParentDocument')
->update()
->field('subDocument')->push( (object) array('url'=> $url) )
->field('id')->equals( $parentDocumentId )
->getQuery()
->execute();
Upvotes: 1
Reputation: 36784
I would avoid using Symfony's document manager and use the batchInsert() function directly. This is described in the documentation at http://php.net/manual/en/mongocollection.batchinsert.php It feels to me like Doctrine's ODM is actually hurting you here.
Upvotes: 3