Federico Taschin
Federico Taschin

Reputation: 2195

Scala: all substrings of length k

I am totally new to scala and I am having trouble understanding how I can use functions like map() or foreach() to perform operations on strings.

In particular, I am trying to extract all unique contiguous substrings of length k from a string (called k-shingles). My function kshingles(s: String, k: Int) called on the string "abcdab" should return Set("ab", "bc", "cd", "da").

How can I achieve that in scala? A bonus would be to do it in a way that it can be parallelized (e.g. using Spark)

Upvotes: 2

Views: 298

Answers (1)

Tomer Shetah
Tomer Shetah

Reputation: 8529

sliding is the method you are looking for. From sliding documentation:

Groups elements in fixed size blocks by passing a "sliding window" over them (as opposed to partitioning them, as is done in grouped.) The "sliding window" step is set to one.

For example "abcdab".sliding(2).toSet will provide the result you are looking for.

In Scala 2.13 String.sliding is deprecated. The correct solution at Scala 2.13 will be:

"abcdab".toSeq.sliding(2).map(_.unwrap).toSet

Upvotes: 4

Related Questions