Reputation: 146
How can I append an item to an array in dataframe (spark 2.3)?
Here is an example with integers, but the real case is with struct.
Input:
+------+-------------+
| key| my_arr |
+------+-------------+
|5 |[3,14] |
|3 |[9,5.99] |
+------+-------------+
output:
+-------------+
| my_arr |
+-------------+
|[3,14,5] |
|[9,5.99,3] |
+-------------+
Upvotes: 1
Views: 3078
Reputation: 1625
Solution without UDF - PYSPARK
I was facing similar kind of problem & definitely did't wanted to use UDF because of performance degradation
spark_df.show(3,False)
+---+-----------+
|key|myarr |
+---+-----------+
|5 |[3.0, 14.0]|
|3 |[9.0, 5.99]|
+---+-----------+
Output:
spark_df=spark_df.\
withColumn("myarr",F.split(F.concat(F.concat_ws(",",F.col("myarr")),F.lit(",") ,F.col("key")),",\s*" ) )
spark_df.select("myarr").show(3,False)
+------------+
|myarr |
+------------+
|[3.0,14.0,5]|
|[9.0,5.99,3]|
+------------+
Method Steps:
Hope this helps.
Upvotes: 0
Reputation: 1528
Here is another way using Struct
:
Input:
df.show()
+---+--------+
|Key|My_Array|
+---+--------+
| 5| [3,14]|
| 3| [9,45]|
+---+--------+
df.withColumn("My_Array", struct($"My_Array.*", $"Key")).show(false)
Output:
+---+--------+
|Key|My_Array|
+---+--------+
|5 |[3,14,5]|
|3 |[9,45,3]|
+---+--------+
Upvotes: 1
Reputation: 175
you must create udf to add elements , with integer is easy but with struct is more complicate.
With integers de code is :
`
val udfConcat = udf((key:Int,my_arr:WrappedArray[Int])=> my_arr:+key)
df.withColumn("my_arr",udfConcat(col("key"), col("my_arr"))).drop("key").show()
`
With struct de code is :
`
val schemaTyped = new StructType()
.add("name", StringType)
.add("age", IntegerType)
val schema = ArrayType(schemaTyped)
val udfConcatStruct = udf((key: Row, my_arr: Seq[Row]) => my_arr :+ key, schema)
df2.withColumn("my_arr", udfConcatStruct(col("key"), col("my_arr"))).drop("key").show(false)
`
When you create the udf , you must pass de schema of Array , in this example is array of element with names and ages.
Upvotes: 1