Jiang Xiang
Jiang Xiang

Reputation: 3256

How to test if a value is a key of an RDD

I am very new to Spark and Scala, and I want to test if a value is a key from an RDD.

The data I have is like this:

RDD data: key -> value

RDD stat: key -> statistics

What I want to do is to filter all the key-value pairs in data that has the key in stat.

My general idea is to convert the keys of an RDD into a set, then test if a value belongs to this set?

Are there better approaches, and how to convert the keys of an RDD into a set using Scala?

Thanks.

Upvotes: 1

Views: 3144

Answers (1)

Soumya Simanta
Soumya Simanta

Reputation: 11751

You can use lookup

def lookup(key: K): List[V]

Return the list of values in the RDD for key key. This operation is done efficiently if the RDD has a known partitioner by only searching the partition that the key maps to.

You asked -

What I want to do is to filter all the key-value pairs in data that has the key in stat.

I think you should join by key instead of doing a lookup.

join(otherDataset, [numTasks])

When called on datasets of type (K, V) and (K, W), returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key. Outer joins are supported through leftOuterJoin, rightOuterJoin, and fullOuterJoin.

.

"close over an RDD inside another RDD."

Basically using an RDD inside the transformations (in this case filter) of another RDD. Nesting of one RDD inside another is not allowed in Spark.

Upvotes: 2

Related Questions