Ravi.Kumar
Ravi.Kumar

Reputation: 767

Accessing a specific element of an Array RDD in apache-spark scala

I have a RDD that is containing an array of key,value pairs. I want to get an element with key (say 4).

scala> val a = sc.parallelize(List("dog","tiger","lion","cat","spider","eagle"),2)
a: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:27
scala> val b = a.keyBy(_.length)
b: org.apache.spark.rdd.RDD[(Int, String)] = MapPartitionsRDD[1] at keyBy at <console>:29

I have tried to apply filter on it but getting error.

scala> val c = b.filter(p => p(0) = 4);
<console>:31: error: value update is not a member of (Int, String)
         val c = b.filter(p => p(0) = 4);

I want to print the key,value pair with specific key (say 4) as Array((4,lion))

The data is always coming in the form of an array of key,value pair

Upvotes: 1

Views: 7008

Answers (2)

maasg
maasg

Reputation: 37435

There's a lookup method applicable to RDDs of Key-Value pairs (RDDs of type RDD[(K,V)]) that directly offers this functionality.

b.lookup(4)
// res4: Seq[String] = WrappedArray(lion)

b.lookup(5)
// res6: Seq[String] = WrappedArray(tiger, eagle)

Upvotes: 1

mrsrinivas
mrsrinivas

Reputation: 35404

use p._1 instead of p(0).

val rdd = sc.parallelize(List("dog", "tiger", "lion", "cat", "spider", "eagle"), 1)

val kvRdd: RDD[(Int, String)] = rdd.keyBy(_.length)
val filterRdd: RDD[(Int, String)] = kvRdd.filter(p => p._1 == 4)

//display rdd
println(filterRdd.collect().toList)

List((4,lion))

Upvotes: 1

Related Questions