alryosha
alryosha

Reputation: 743

Pyspark create multiple columns from dictionary column

@udf(returnType=MapType(StringType(), FloatType()))
def postprocess(data):
    ret = dict()
    ....
    # insert key and values to dictionary from data
    ...

    return ret

ret = postprocess(col('data'))
print(ret) # Column<'postprocess(data)'>

I would like to create multiple columns from dictionary column.

If ret has {"key1": 0.1, "key2": 0.3}, the result should be

| key1 | key2 |

| 0.1 | 0.3 |

How can I create it?

Upvotes: 0

Views: 179

Answers (1)

Jonathan
Jonathan

Reputation: 2033

To achieve your goal, you can use .explode() to create multiple columns from a dictionary column. Details: https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.explode.html

However, in the performance perspective, not sure how complicated your UDF is, I think you should use the spark sql function to create the columns instead of using the Python UDF function if it's possible. You can check this post: https://stackoverflow.com/a/38297050/10445333

Upvotes: 1

Related Questions