How can I deal with each adjoin two element difference greater than threshold from Spark RDD

Question

I have a problem with Spark Scala which get the value of each adjoin two element difference greater than threshold,I create a new RDD like this:

  [2,3,5,8,19,3,5,89,20,17]

I want to subtract each two adjoin element like this:

 a.apply(1)-a.apply(0) ,a.apply(2)-a.apply(1),…… a.apply(a.lenght)-a.apply(a.lenght-1)

If the result greater than the threshold of 10,than output the collection,like this:

[19,89]

How can I do this with scala from RDD?

Ramesh Maharjan · Accepted Answer

If you have data as

val data = Seq(2,3,5,8,19,3,5,89,20,17)

you can create rdd as

val rdd = sc.parallelize(data)

What you desire can be achieved by doing the following

import org.apache.spark.mllib.rdd.RDDFunctions._
 val finalrdd = rdd
                  .sliding(2)
                  .map(x => (x(1), x(1)-x(0)))
                  .filter(y => y._2 > 10)
                  .map(z => z._1)

Doing

finalrdd.foreach(println)

should print

19
89

How can I deal with each adjoin two element difference greater than threshold from Spark RDD

Answers (2)

Related Questions