S.Kang
S.Kang

Reputation: 611

how to insert element to rdd array in spark

Hi I've tried to insert element to rdd array[String] using scala in spark.

Here is example.

val data =  RDD[Array[String]] = Array(Array(1,2,3), Array(1,2,3,4), Array(1,2)).

I want to make length 4 of all arrays in this data.

If the length of array is less than 4, I want to fill the NULL value in the array.

here is my code that I tried to solve.

val newData = data.map(x => 
    if(x.length < 4){
        for(i <- x.length until 4){
        x.union("NULL") 
        }
    }
    else{
        x
    }
)

But The result is Array[Any] = Array((), Array(1, 2, 3, 4), ()).

So I tried another ways. I used yield on for loop.

val newData = data.map(x => 
    if(x.length < 4){
        for(i <- x.length until 4)yield{
        x.union("NULL") 
        }
    }
    else{
        x
    }
)

The result is Array[Object] = Array(Vector(Array(1, 2, 3, N, U, L, L)), Array(1, 2, 3, 4), Vector(Array(1, 2, N, U, L, L), Array(1, 2, N, U, L, L)))

these are not what I want. I want to return like this

RDD[Array[String]] = Array(Array(1,2,3,NULL), Array(1,2,3,4), Array(1,2,NULL,NULL)).

What should I do? Is there a method to solve it?

Upvotes: 0

Views: 8022

Answers (2)

Anton Okolnychyi
Anton Okolnychyi

Reputation: 966

I solved your use case with the following code:

val initialRDD = sparkContext.parallelize(Array(Array[AnyVal](1, 2, 3), Array[AnyVal](1, 2, 3, 4), Array[AnyVal](1, 2, 3)))
val transformedRDD = initialRDD.map(array =>
  if (array.length < 4) {
    val transformedArray = Array.fill[AnyVal](4)("NULL")
    Array.copy(array, 0, transformedArray, 0, array.length)
    transformedArray
  } else {
    array
  }
)
val result = transformedRDD.collect()

Upvotes: 1

Tim
Tim

Reputation: 3725

union is a functional operation, it doesn't change the array x. You don't need to do this with a loop, though, and any loop implementations will probably be slower -- it's much better to create one new collection with all the NULL values instead of mutating something every time you add a null. Here's a lambda function that should work for you:

def fillNull(x: Array[Int], desiredLength: Int): Array[String] = {
  x.map(_.toString) ++ Array.fill(desiredLength - x.length)("NULL") 
}

val newData = data.map(fillNull(_, 4))

Upvotes: 2

Related Questions