Reputation: 743
@udf(returnType=MapType(StringType(), FloatType()))
def postprocess(data):
ret = dict()
....
# insert key and values to dictionary from data
...
return ret
ret = postprocess(col('data'))
print(ret) # Column<'postprocess(data)'>
I would like to create multiple columns from dictionary column.
If ret has {"key1": 0.1, "key2": 0.3}, the result should be
| key1 | key2 |
| 0.1 | 0.3 |
How can I create it?
Upvotes: 0
Views: 179
Reputation: 2033
To achieve your goal, you can use .explode()
to create multiple columns from a dictionary column. Details: https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.explode.html
However, in the performance perspective, not sure how complicated your UDF is, I think you should use the spark sql function to create the columns instead of using the Python UDF function if it's possible. You can check this post: https://stackoverflow.com/a/38297050/10445333
Upvotes: 1