UDF over the array elements in Pyspark

Question

I have a dataframe like below

col1
------
[{"a":"1","b":"2"},{"a":"11,"b":"22"}]

now i want to include the new struct using the existing value {"cc": "1" } --> here 1 is coming from "a": "1"

col1
------
[{"a":"1","b":"2", {"cc": "1" }},{"a":"11,"b":"22",{"cc": "11" } }]

please suggest me the udf in pyspark,

Mohana B C · Accepted Answer

You can use transform function (from spark V2.4) to get desired result.

from pyspark.sql import *
from pyspark.sql.functions import *

spark = SparkSession.builder.master('local[*]').getOrCreate()

df = spark.createDataFrame([('[{"a":"1","b":"2"},{"a":"11","b":"22"}]',)],"col1 string")

df.withColumn("col1", from_json("col1", schema_of_json(df.select("col1").first()[0]))).\
    selectExpr("to_json(transform(col1, x-> "
               "struct(x.a as a, x.b as b, struct(x.a as cc) as cc))) as co1").\
    show(truncate=False)

    +------------------------------------------------------------------------+
    |co1                                                                     |
    +------------------------------------------------------------------------+
    |[{"a":"1","b":"2","cc":{"cc":"1"}},{"a":"11","b":"22","cc":{"cc":"11"}}]|
    +------------------------------------------------------------------------+

UDF over the array elements in Pyspark

Answers (1)

Related Questions