Reputation: 611
Hi I've tried to insert element to rdd array[String] using scala in spark.
Here is example.
val data = RDD[Array[String]] = Array(Array(1,2,3), Array(1,2,3,4), Array(1,2)).
I want to make length 4 of all arrays in this data.
If the length of array is less than 4, I want to fill the NULL value in the array.
here is my code that I tried to solve.
val newData = data.map(x =>
if(x.length < 4){
for(i <- x.length until 4){
x.union("NULL")
}
}
else{
x
}
)
But The result is Array[Any] = Array((), Array(1, 2, 3, 4), ())
.
So I tried another ways. I used yield
on for loop.
val newData = data.map(x =>
if(x.length < 4){
for(i <- x.length until 4)yield{
x.union("NULL")
}
}
else{
x
}
)
The result is Array[Object] = Array(Vector(Array(1, 2, 3, N, U, L, L)), Array(1, 2, 3, 4), Vector(Array(1, 2, N, U, L, L), Array(1, 2, N, U, L, L)))
these are not what I want. I want to return like this
RDD[Array[String]] = Array(Array(1,2,3,NULL), Array(1,2,3,4), Array(1,2,NULL,NULL)).
What should I do? Is there a method to solve it?
Upvotes: 0
Views: 8022
Reputation: 966
I solved your use case with the following code:
val initialRDD = sparkContext.parallelize(Array(Array[AnyVal](1, 2, 3), Array[AnyVal](1, 2, 3, 4), Array[AnyVal](1, 2, 3)))
val transformedRDD = initialRDD.map(array =>
if (array.length < 4) {
val transformedArray = Array.fill[AnyVal](4)("NULL")
Array.copy(array, 0, transformedArray, 0, array.length)
transformedArray
} else {
array
}
)
val result = transformedRDD.collect()
Upvotes: 1
Reputation: 3725
union
is a functional operation, it doesn't change the array x
. You don't need to do this with a loop, though, and any loop implementations will probably be slower -- it's much better to create one new collection with all the NULL values instead of mutating something every time you add a null. Here's a lambda function that should work for you:
def fillNull(x: Array[Int], desiredLength: Int): Array[String] = {
x.map(_.toString) ++ Array.fill(desiredLength - x.length)("NULL")
}
val newData = data.map(fillNull(_, 4))
Upvotes: 2