Rajni Kant Sharma
Rajni Kant Sharma

Reputation: 114

concatenate all struct fields nested to array in spark

My schema structure is following. I need to concatenate #VALUE,@DescriptionCode and @LanguageCode these are nested to an array.

root
 |-- partnumber: string (nullable = true)
 |-- brandlabel: string (nullable = true)
 |-- availabledate: string (nullable = true)
 |-- description: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- #VALUE: string (nullable = true)
 |    |    |-- @DescriptionCode: string (nullable = true)
 |    |    |-- @LanguageCode: string (nullable = true)

I have tried a lot but nothing work for me. I need following schema

root
 |-- partnumber: string (nullable = true)
 |-- brandlabel: string (nullable = true)
 |-- availabledate: string (nullable = true)
 |-- descriptions: array (nullable = true)
 |-- |--   element: string (containsNull = true) 

Upvotes: 0

Views: 2787

Answers (2)

Kumar
Kumar

Reputation: 41

    `root
     |-- partnumber: string (nullable = true)
     |-- brandlabel: string (nullable = true)
     |-- availabledate: string (nullable = true)
     |-- description: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- #VALUE: string (nullable = true)
     |    |    |-- @DescriptionCode: string (nullable = true)
     |    |    |-- @LanguageCode: string (nullable = true)
     |    |    |-- @Language: string (nullable = true)`

    suppose We want to concatenate 2 struct fields as one string separated by :,next 2 struct fields as another column.


root
 |-- partnumber: string (nullable = true)
 |-- brandlabel: string (nullable = true)
 |-- availabledate: string (nullable = true)
 |-- descriptions: array (nullable = true)
 |-- |--   element1: string (containsNull = true)
 |-- |--   element2: string (containsNull = true)

Upvotes: 0

Daniel de Paula
Daniel de Paula

Reputation: 17872

I believe you need to create an User Defined Function:

import org.apache.spark.sql.functions._

val func: (Seq[Row]) => Seq[String] = {
  _.map( 
    element =>
      element.getAs[String]("#VALUE") + 
      element.getAs[String]("@DescriptionCode") +
      element.getAs[String]("@LanguageCode")
  )
}

val myUDF = udf(func)

df.withColumn("descriptions", myUDF(col("description"))).drop(col("description"))

For more information about UDFs, you can read this article.

Upvotes: 1

Related Questions