Mohan
Mohan

Reputation: 473

Replace the value of one column from another column in spark dataframe

I have a dataframe like below

+---+------------+----------------------------------------------------------------------+
|id |indexes     |arrayString                                                           |
+---+------------+----------------------------------------------------------------------+
|2  |1,3         |[WrappedArray(3, Str3), WrappedArray(1, Str1)]                        |
|1  |2,4,3       |[WrappedArray(2, Str2), WrappedArray(3, Str3), WrappedArray(4, Str4)] |
|0  |1,2,3       |[WrappedArray(1, Str1), WrappedArray(2, Str2), WrappedArray(3, Str3)] |
+---+------------+----------------------------------------------------------------------+

i want to loop through arrayString and get the first element as index and second element as String. Then replace the indexes with String corresponding to the index in arrayString. I want an output like below.

+---+---------------+
|id |replacedString |
+---+---------------+
|2  |Str1,Str3      |
|1  |Str2,Str4,Str3 |
|0  |Str1,Str2,Str3 |
+---+---------------+

I tried using the below udf function.

  val replaceIndex = udf((itemIndex: String, arrayString: Seq[Seq[String]]) => {
      val itemIndexArray = itemIndex.split("\\,")
    arrayString.map(i => {
      itemIndexArray.updated(i(0).toInt,i(1))
    })
    itemIndexArray
  })

This is giving me error and i am not getting my desired output. Is there any other way to achieve this. I cant use explode and join as i want the indexes replaced with string without losing the order.

.

Upvotes: 1

Views: 2855

Answers (1)

koiralo
koiralo

Reputation: 23099

You can create an udf as below to get the required result, Convert to the Array of array to map and find the index as a key in map.

val replaceIndex = udf((itemIndex: String, arrayString: Seq[Seq[String]]) => {
  val indexList = itemIndex.split("\\,")
  val array = arrayString.map(x => (x(0) -> x(1))).toMap
  indexList map array mkString ","
})

dataframe.withColumn("arrayString", replaceIndex($"indexes", $"arrayString"))
.show( false)

Output:

+---+-------+--------------+
|id |indexes|arrayString   |
+---+-------+--------------+
|2  |1,3    |Str1,Str3     |
|1  |2,4,3  |Str2,Str4,Str3|
|0  |1,2,3  |Str1,Str2,Str3|
+---+-------+--------------+

Hope this helps!

Upvotes: 1

Related Questions