qwerty
qwerty

Reputation: 546

Filter recursively an RDD in Scala-Spark1.5.2

I have an RDD with 50 columns where I want to get for each row the first element and the last 5 columns where the value of the first character of the last row is a number and in case the first character of the last column is a character, continue iteratively this process. For instance, lets suppose the original RDD has the following content (keys are not included in order to make it easier to be read):

[45 first values], 1, 2, a, 3, 4
[44 first values], 0, 1, 2, 3, 4, b
[43 first values], 10, 11, 12, 13, 14, q, a

The desired output after the transformation would be:

1, 2, a, 3, 4
0, 1, 2, 3, 4
10, 11, 12, 13, 14

I manage to filter the last element of the input RDD with the following sentence:

var aux = rdd.map(row => row.slice(0, 1) ++ row.slice(45, 50)).filter(elem => elem(5)._2(0).isDigit == true)

Following this syntax, I can also filter the -nth element of the original RDD:

var aux = rdd.map(row => row.slice(0, 1) ++ row.slice(44, 50)).filter(elem => elem(5)._2(0).isDigit == true).map(_.slice(0,6))

My question is, is there any possible way to do this iteratively specifying a range of elements inside the map and/or the filter or something like that and do this process in a pair or sentences or is it required to save the results of each of these sentences in an auxiliary variable and then try to merge every single result in a new RDD?

Upvotes: 2

Views: 212

Answers (1)

Cyrille Corpet
Cyrille Corpet

Reputation: 5305

What you probably want (in your map method) is something like

row.dropRightWhile(cell => !cell(0).isDigit)

However, dropRightWhile is not a method on Seq, so you probably need to do a reverse before and after this treatment as follows:

row.reverse.dropWhile(cell => !cell(0).isDigit).reverse

Upvotes: 2

Related Questions