Reputation: 3297
I have a Dataframe df1 formated as below:
+--------------------------+
|DateInfos |
+--------------------------+
|[[3, A, 111], [4, B, 222]]|
|[[1, C, 333], [2, D, 444]]|
|[[5, E, 555]] |
+--------------------------+
I would like to concat the second and third element of each tuples3 with the separator "-" (df2):
+------------------------+
|DateInfos |
+------------------------+
|[[3, A-111], [4, B-222]]|
|[[1, C-333], [2, D-444]]|
|[[5, E-555]] |
+------------------------+
I print the schema of df1:
root
|-- DateInfos: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- _1: integer (nullable = false)
| | |-- _2: string (nullable = true)
| | |-- _3: string (nullable = true)
I assume I have to create an udf which use a function with the following signature:
def concatDF1(array: Array[(Int, String, String)]): Array[(Int, String)] = {
val res = Array.map(elem => (elem._1, elem._2 + "-" + elem._3)).toArray
res
}
I execute the method like this:
val concat_udf = sqlContext.udf.register("concat_udf", concat _)
val df2_temp = df1.withColumn("DataInfos_temp",concat_udf(df1("DataInfos")))
val df2 = df2_temp.drop("DataInfos").withColumnRenamed("DataInfos_temp", "DataInfos")
I obtain this error:
Caused by: org.apache.spark.SparkException: Failed to execute user defined function(anonfun$4: (array<struct<_1:int,_2:string,_3:string>>) => array<struct<_1:int,_2:string>>)
Do you have any idea?
Upvotes: 2
Views: 4266
Reputation: 4471
This should do the job:
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
val sparkSession = ...
import sparkSession.implicits._
val input = sc.parallelize(Seq(
Seq((3, "A", 111), (4, "B", 222)),
Seq((1, "C", 333), (2, "D", 444)),
Seq((5, "E", 555))
)).toDF("DateInfos")
val concatElems = udf { seq: Seq[Row] =>
seq.map { case Row(x: Int, y: String, z: Int) =>
(x, s"$y-$z")
}
}
val output = input.select(concatElems($"DateInfos").as("DateInfos"))
output.show(truncate = false)
Which outputs:
+----------------------+
|DateInfos |
+----------------------+
|[[3,A-111], [4,B-222]]|
|[[1,C-333], [2,D-444]]|
|[[5,E-555]] |
+----------------------+
Upvotes: 4