Venus
Venus

Reputation: 146

How to append item to array in Spark 2.3

How can I append an item to an array in dataframe (spark 2.3)?

Here is an example with integers, but the real case is with struct.

Input:

+------+-------------+
|   key|     my_arr  |
+------+-------------+
|5     |[3,14]       |
|3     |[9,5.99]     |
+------+-------------+

output:

+-------------+
|     my_arr  |
+-------------+
|[3,14,5]     |
|[9,5.99,3]   |
+-------------+

Upvotes: 1

Views: 3078

Answers (3)

Abhishek
Abhishek

Reputation: 1625

Solution without UDF - PYSPARK

I was facing similar kind of problem & definitely did't wanted to use UDF because of performance degradation

spark_df.show(3,False)

    +---+-----------+
    |key|myarr      |
    +---+-----------+
    |5  |[3.0, 14.0]|
    |3  |[9.0, 5.99]|
    +---+-----------+

Output:

spark_df=spark_df.\
        withColumn("myarr",F.split(F.concat(F.concat_ws(",",F.col("myarr")),F.lit(",") ,F.col("key")),",\s*" ) )

spark_df.select("myarr").show(3,False)


    +------------+
    |myarr       |
    +------------+
    |[3.0,14.0,5]|
    |[9.0,5.99,3]|
    +------------+

Method Steps:

  1. First convert Array Column into String using concat_ws method
  2. Use concat function to merge required column ("key") with original column ("myarr")
  3. Use split function to convert string column from above step back to Array

Hope this helps.

Upvotes: 0

1pluszara
1pluszara

Reputation: 1528

Here is another way using Struct:

Input:

df.show()
+---+--------+
|Key|My_Array|
+---+--------+
|  5|  [3,14]|
|  3|  [9,45]|
+---+--------+

df.withColumn("My_Array", struct($"My_Array.*", $"Key")).show(false)

Output:

+---+--------+
|Key|My_Array|
+---+--------+
|5  |[3,14,5]|
|3  |[9,45,3]|
+---+--------+  

Upvotes: 1

Francoceing C
Francoceing C

Reputation: 175

you must create udf to add elements , with integer is easy but with struct is more complicate.

With integers de code is :

`

 val udfConcat = udf((key:Int,my_arr:WrappedArray[Int])=> my_arr:+key)
     df.withColumn("my_arr",udfConcat(col("key"), col("my_arr"))).drop("key").show()

`

With struct de code is :

`

val schemaTyped = new StructType()
      .add("name", StringType)
      .add("age", IntegerType)
    val schema = ArrayType(schemaTyped)
    val udfConcatStruct = udf((key: Row, my_arr: Seq[Row]) => my_arr :+ key, schema)
    df2.withColumn("my_arr", udfConcatStruct(col("key"), col("my_arr"))).drop("key").show(false)

`

When you create the udf , you must pass de schema of Array , in this example is array of element with names and ages.

Upvotes: 1

Related Questions