Reputation: 546
I have an RDD with 50 columns where I want to get for each row the first element and the last 5 columns where the value of the first character of the last row is a number and in case the first character of the last column is a character, continue iteratively this process. For instance, lets suppose the original RDD has the following content (keys are not included in order to make it easier to be read):
[45 first values], 1, 2, a, 3, 4
[44 first values], 0, 1, 2, 3, 4, b
[43 first values], 10, 11, 12, 13, 14, q, a
The desired output after the transformation would be:
1, 2, a, 3, 4
0, 1, 2, 3, 4
10, 11, 12, 13, 14
I manage to filter the last element of the input RDD with the following sentence:
var aux = rdd.map(row => row.slice(0, 1) ++ row.slice(45, 50)).filter(elem => elem(5)._2(0).isDigit == true)
Following this syntax, I can also filter the -nth element of the original RDD:
var aux = rdd.map(row => row.slice(0, 1) ++ row.slice(44, 50)).filter(elem => elem(5)._2(0).isDigit == true).map(_.slice(0,6))
My question is, is there any possible way to do this iteratively specifying a range of elements inside the map and/or the filter or something like that and do this process in a pair or sentences or is it required to save the results of each of these sentences in an auxiliary variable and then try to merge every single result in a new RDD?
Upvotes: 2
Views: 212
Reputation: 5305
What you probably want (in your map
method) is something like
row.dropRightWhile(cell => !cell(0).isDigit)
However, dropRightWhile
is not a method on Seq
, so you probably need to do a reverse
before and after this treatment as follows:
row.reverse.dropWhile(cell => !cell(0).isDigit).reverse
Upvotes: 2