Reputation: 11
I have an issue that happens infrequently where partial results are being written into our HBase database. Here is a description of my mappers and reducers:
Mappers count data associated with a particular feature and send MapWritables with keys = data name and values = count. For example:
Key = "Feature X" MapWritables = {"Total Usage":"4", "Unique Usage":2, "Associated Revenue":22}, {"Total Usage":"3", "Unique Usage":1, "Associated Revenue":20}
Reducers sum the values in the MapWritable if the keys are the same. The results are written to HBase where the key is the row id, the columns are the keys in the map, and the values are the sums. Given the example keys and maps from above, we would write into HBase:
rowID = "Feature X" column,value="Total Usage", 7 column,value="Unique Usage", 3 column,value="Associated Revenue", 42
Twice in the last 4 months (so not very frequently), results have been written to HBase such that one of the columns has some very low number (like 1 or 3), and the remaining columns have normal numbers. When I re-run the job, the erroneous column value jumps up to its expected value. It was not the same column that was "broken" both times. No errors were written to the logs.
Has anyone else experienced similar behavior? Does anyone have ideas? Any help would be appreciated. Thanks!
Upvotes: 0
Views: 416
Reputation: 25919
If you are using hadoop 0.20.* it has missing append support which causes hbase to lose data occasionally. HBase needs append for things like writing to the WAL and without it there's no guarantee all writes make it to disk. - if that's your case you can just update hadoop to a higher version.
Upvotes: 0