Reputation: 23
I was wondering of anyone has ever encountered this:
When inserting documents via AQL, I can easily kill my arango server. For example
FOR i IN 1 .. 10
FOR u IN users
INSERT {
_from: u._id,
_to: CONCAT("posts/",CEIL(RAND()*2000)),
displayDate: CEIL(RAND()*100000000)
} INTO canSee
(where users contains 500000 entries), the following happens
So ok, I'm creating a lot of entries and AQL might be implemented in a way that it does this in bulk. When doing the writes via db.save method it works but is much slower.
Also I suspect this might have to do with write-ahead cache filling up.
But still, is there a way I can fix this? Writing a lot of entries to a database should not necessarily kill it.
Logs say
DEBUG [./lib/GeneralServer/GeneralServerDispatcher.h:411] shutdownHandler called, but no handler is known for task
DEBUG [arangod/VocBase/datafile.cpp:949] created datafile '/usr/local/var/lib/arangodb/journals/logfile-6623368699310.db' of size 33554432 and page-size 4096
DEBUG [arangod/Wal/CollectorThread.cpp:1305] closing full journal '/usr/local/var/lib/arangodb/databases/database-120933/collection-4262707447412/journal-6558669721243.db'
bests
Upvotes: 2
Views: 465
Reputation: 9097
The above query will insert 5M documents into ArangoDB in a single transaction. This will take a while to complete, and while the transaction is still ongoing, it will hold lots of (potentially needed) rollback data in memory.
Additionally, the above query will first build up all the documents to insert in memory, and once that's done, will start inserting them. Building all the documents will also consume a lot of memory. When executing this query, you will see the memory usage steadily increasing until at some point the disk writes will kick in when the actual inserts start.
There are at least two ways for improving this:
it might be beneficial to split the query into multiple, smaller transactions. Each transaction then won't be as big as the original one, and will not block that many system resources while ongoing.
for the query above, it technically isn't necessary to build up all documents to insert in memory first, and only after that insert them all. Instead, documents read from users
could be inserted into canSee
as they arrive. This won't speed up the query, but it will significantly lower memory consumption during query execution for result sets as big as above. It will also lead to the writes starting immediately, and thus write-ahead log collection starting earlier. Not all queries are eligible for this optimization, but some (including the above) are. I worked on a mechanism today that detects eligible queries and executes them this way. The change was pushed into the devel branch today, and will be available with ArangoDB 2.5.
Upvotes: 5