Reputation: 737
We have a situation where we host data for:
All inside the same cluster/table.
With YARN we can manage resources like CPU and RAM, but during intensive scans HDD can become a bottleneck and can slow down random read performance. How to manage that resource
How this kind of situations are being handled in general?
Upvotes: 1
Views: 54
Reputation: 1812
Since mapreduce generally does not require live data, people often make a backup of hbase table and run mapreduce on backup data table. Or do a snapshot of table and run mp. on it.
Upvotes: 0