How to recover data if Bulk Load API is used in Java MapReduce?

Question

In production, we use Bulk load API to load data into Hbase tables by passing two arguments to bulk load API (pathToHfile, targetTableName).

pathToHfile ---> Location of Hfiles in hadoop
targetTableName ---> The target table that we want to load

When we use Bulk load API the writes does not happen to WAL file. But WAL files are used to recover the data. So how are we going to recover the data in this case since the data is not getting written to the WAL file?

shay__ · Accepted Answer

WAL is used to recover changes that were not written to HFiles (i.e. from crashed MemStore). In bulk loading you are creating the HFiles manually and hand them over to HBase. The actual loading of the new files in HBase is atomic, so no recovery mechanism needed here.

How to recover data if Bulk Load API is used in Java MapReduce?

Answers (1)

Related Questions