Reputation: 534
I have a data-set of size 10 Petabytes. My current data is in HBase where I am using Spark HbaseContext
but it is not performing well.
Will it be useful to move data from HbaseContext
to HiveContext
on Spark?
Upvotes: 0
Views: 156
Reputation: 226
In my use case, I use mapPartition with a HBase connection inside. The key is just to know how to split.
For scan, you can create your own scanner, with prefix, etc... For get it's even easier. For puts, you can create a list of puts to do then batch insertion.
I don't use any HBaseContext and I have quite good performances on database of 1,2 billion rows.
Upvotes: 0
Reputation: 772
HiveContext is used to read data from Hive. so, if you switch to HiveContext the data has to be in Hive. I don't think what you are trying will work.
Upvotes: 0