lkn2993
lkn2993

Reputation: 566

Using apache spark to iterate over string

For example we have the string "abcdabcd"

And we want to count all the pairs (e.g: "ab" or "da") that are available in the string.

So how do we do that in apache spark?

I asked this cause it looks like that the RDD does not support sliding function:

rdd.sliding(2).toList
//Count number of pairs in list
//Returns syntax error on first line (sliding)

Upvotes: 1

Views: 760

Answers (1)

Odomontois
Odomontois

Reputation: 16308

Apparently it supports sliding via mllib as shown by zero323 here

import org.apache.spark.mllib.rdd.RDDFunctions._

val str = "abcdabcd"

val rdd = sc.parallelize(str)

rdd.sliding(2).map(_.mkString).toLocalIterator.forEach(println)

will show

ab
bc
cd
da
ab
bc
cd

Upvotes: 5

Related Questions