Sushant Gupta
Sushant Gupta

Reputation: 1527

Iterate over spark cogroup() pairrdd output in scala

I created 2 Pair RDD's in Spark

var pairrdd = sc.parallelize(List((1,2),(3,4),(3,6)))
var pairrdd2 = sc.parallelize(List((3,9)))

I applied the cogroup function

var cogrouped = pairrdd.cogroup(pairrdd2)

The object type for cogroupedrdd looks like below.

cogrouped: org.apache.spark.rdd.RDD[(Int, (Iterable[Int], Iterable[Int]))] = MapPartitionsRDD[801] at cogroup at <console>:60

I am trying to create a function to iterate over these values

def iterateThis((x: Int,(x1:Iterable[Int],x2:Iterable[Int])))={
  println(x1.mkString(","))
}

but am I getting below error.

<console>:21: error: identifier expected but '(' found.
       def iterateThis((x: Int,(x1:Iterable[Int],x2:Iterable[Int])))={
                   ^

Upvotes: 0

Views: 408

Answers (1)

Jean Logeart
Jean Logeart

Reputation: 53839

Your argument is of type (Int, (Iterable[Int], Iterable[Int])):

def iterateThis(arg: (Int, (Iterable[Int], Iterable[Int]))) = {
  val (_, (x1, _)) = arg
  println(x1.mkString(","))
}

Upvotes: 1

Related Questions