Reputation: 18299
With the HappyBase API for HBase in Python, a batch insert can be performed by the following:
import happybase
connection = happybase.Connection()
table = connection.table('table-name')
batch = table.batch()
# put several rows to this batch via batch.put()
batch.send()
What would happen in the event this batch failed half way through? Would the rows that had been saved remain saved and those that didn't not be saved?
I noted in the HappyBase github that the table.batch()
method takes transaction
and wal
as parameters. Could these be configured in such a way as to rollback the successfully saved rows in the event the batch fails halfway through?
Will happybase throw an exception here, which would permit me to take note of the row keys and perform a batch delete?
Upvotes: 1
Views: 2307
Reputation: 4037
Did you follow the tutorial about batch mutations in the Happybase docs? It looks like you're mixing up a few things here. https://happybase.readthedocs.org/en/latest/user.html#performing-batch-mutations
Batches are purely a performance optimization: they avoid round-tripping to the Thrift server for each row that is stored/deleted, which may result in a significant speedup.
The context manager behaviour (the with
block), as explained with numerous examples in the user guide linked above, is a purely client-side convenience API that makes application code easier to write and reason about. If the with
block completes successfully all mutations are sent to the server in one go.
However... that's only the happy path. What to do in case some Python exception was raised somewhere from the with
block? That's where the transaction
flag comes into play: if True
, no data is sent at all to the server, if False
, any pending data is flushed anyway. Which behaviour is preferred strongly depends on your use case.
Upvotes: 2
Reputation: 1812
I did not know about python or happybase. I understand that transaction is implemented in library as a fallback strategy. Since Hbase does not have any transaction support besides in-row mutations, a library can only simulate transaction by rolling back the operation it just did. I think this Batch class in code does this.
The `transaction` argument specifies whether the returned
:py:class:`Batch` instance should act in a transaction-like manner when
used as context manager in a ``with`` block of code. The `transaction`
flag cannot be used in combination with `batch_size`.
The `wal` argument determines whether mutations should be
written to the HBase Write Ahead Log (WAL). This flag can only
be used with recent HBase versions. If specified, it provides
a default for all the put and delete operations on this batch.
https://github.com/wbolster/happybase/blob/master/happybase/table.py line 460-480
Also wal is a kind of performance parameter. It is faster if an operation is not written to WAL. From hbase doc ;
Turning this off means that the RegionServer will not write the Put to the Write Ahead Log, only into the memstore, HOWEVER the consequence is that if there is a RegionServer failure there will be data loss.
http://hbase.apache.org/0.94/book/perf.writing.html section 11.7.5
Upvotes: 1