Balakrishna D
Balakrishna D

Reputation: 464

Apache Spark RDD value lookup

I loaded data from Hbase and did some operation on that data and a paired RDD is created. I want to use the data of this RDD in my next function. I have half million records in RDD. Can you please suggest performance effective way of reading data by key from the paired RDD .

Upvotes: 0

Views: 7909

Answers (4)

Mukesh_Mike
Mukesh_Mike

Reputation: 33

Do the following:

rdd2 = rdd1.sortByKey()
rdd2.lookup(key)

This will be fast.

Upvotes: 1

Umberto Griffo
Umberto Griffo

Reputation: 931

Only from Driver, you can use rdd.lookup(key) to return all values associated with the provided key.

Upvotes: 1

John Leach
John Leach

Reputation: 528

That is a tough use case. Can you use some datastore and index it?

Check out Splice Machine (Open Source).

Upvotes: 1

anshul_cached
anshul_cached

Reputation: 762

You can use

rddName.take(5)

where 5 is the number of top most elements to be returned. You can change the number accordingly. Also to read the very first element, you can use

rddName.first

Upvotes: 0

Related Questions