Reputation: 9571
I have a system with two processes, one of which does a single insert, and the other a bulk insert. Obviously the second process is faster, and I'm working on migrating the first process to a bulk insert mechanism, but I was stumped this morning by a question from a colleague about "why bulk insert would be faster than single inserts".
So indeed, why is bulk insert faster than single insert?
Also, are there differences between bulk and single inserts in MySQL and HBase, given that their database architectures are completely different? I am using both for my project, and am wondering if there are differences in the bulk and single inserts for these two databases.
Upvotes: 1
Views: 433
Reputation: 1574
In short - Bulkload operation bypasses regular write path. Thats's why it is fast.
So, what happens during normal write process when you do simple row by row put operation?
All the data is written simultaneously to WAL and memstore and when memestore is full, data is flushed to a new HFile.
However in case of Bulkload , it directly writes to StoreFile in the running hbase cluster. NO Intermediate stuff...
Quick tip - if you don't want to use bulkload as often it is done in short burst which put additional burden on the cluster, you can writing to WAL false using Put.setWriteToWal(false)
to save some timing.
But this will increase your data loss chances..
Upvotes: 1
Reputation: 8868
As far as i know, this depends on the Hbase
configuration also. Normally a bulk insert would mean usage of List of Puts
together, in this case, the insert ( called flushing
in habse layer) is done automatically when you call table.put
. Single inserts might wait for any other insert call so as to do a batch flush in the middle layer. However this will depend on the configuration also.
Another reason may be the easiness of task, its more efficient Map and Reduce, if you have more jobs at a time. The migration of file chunks are decided for all inputs single time. But in indvidual inserts, this becomes a crucial point.
Upvotes: 2