Spark: How to 'scan' RDD collections?

Question

Does Spark have any analog of Scala scan operation to work on RDD collections? (for details please see Reduce, fold or scan (Left/Right)?)

For example:

val abc = List("A", "B", "C")

def add(res: String, x: String) = { 
  println(s"op: $res + $x = ${res + x}")
  res + x
}

So to get:

abc.scanLeft("z")(add)
// op: z + A = zA      // same operations as foldLeft above...
// op: zA + B = zAB
// op: zAB + C = zABC
// res: List[String] = List(z, zA, zAB, zABC) // maps intermediate results

Any other means to achieve the same result?

Update

What is "Spark" way to solve, for example, the following problem:

Compute elements of the vector as (in pseudocode):

x(i) = SomeFun(for k from 0 to i-1)(y(k))

Should I collect RDD for this? No other way?

Update 2

Ok, I understand the general problem. Yet maybe you could advise me on the particular case I have to deal with.

I have a list of ints as input RDD and I have to build an outptut RDD, where the following should hold:

1) input.length == output.length // output list is of the same length as input

2) output(i) = sum( range (0..i), input(i)) / q^i // i-th element of output list equals sum of input elements from 0 to i divided by i-th power of some constant q

In fact I need a combination of map and fold function to solve this.

Another idea is to write a recursive fold on diminishing tails of the input list. But this is super inefficient and AFAIK Spark does not have tail or init function for RDD.

How would you solve this problem in Sparck?

Spark: How to 'scan' RDD collections?

Answers (1)

Related Questions

Spark: How to &#39;scan&#39; RDD collections?

Answers (1)

Related Questions

Spark: How to 'scan' RDD collections?