Reputation: 53
I have a dataframe like below
col1
------
[{"a":"1","b":"2"},{"a":"11,"b":"22"}]
now i want to include the new struct using the existing value {"cc": "1" } --> here 1 is coming from "a": "1"
col1
------
[{"a":"1","b":"2", {"cc": "1" }},{"a":"11,"b":"22",{"cc": "11" } }]
please suggest me the udf in pyspark,
Upvotes: 0
Views: 111
Reputation: 5487
You can use transform function (from spark V2.4) to get desired result.
from pyspark.sql import *
from pyspark.sql.functions import *
spark = SparkSession.builder.master('local[*]').getOrCreate()
df = spark.createDataFrame([('[{"a":"1","b":"2"},{"a":"11","b":"22"}]',)],"col1 string")
df.withColumn("col1", from_json("col1", schema_of_json(df.select("col1").first()[0]))).\
selectExpr("to_json(transform(col1, x-> "
"struct(x.a as a, x.b as b, struct(x.a as cc) as cc))) as co1").\
show(truncate=False)
+------------------------------------------------------------------------+
|co1 |
+------------------------------------------------------------------------+
|[{"a":"1","b":"2","cc":{"cc":"1"}},{"a":"11","b":"22","cc":{"cc":"11"}}]|
+------------------------------------------------------------------------+
Upvotes: 1