Split PySpark Map-like string into Map Object

Question

I have a column in PySpark containing dictionary/map-like values that are stored as strings.

Example Values:

'{1:'Hello', 2:'Hi', 3:'Hola'}'
'{1:'Dogs', 2:'Dogs, Cats, and Fish', 3:'Fish & Turtles'}'
'{1:'Pizza'}'

I'd like to convert these strings into either an array or map, so I can then use the .explode() function on them to create a row for each dict key-item pair. I would use .split() on each comma, but since some values have commas in them, this does not work.

I was using the ast.literal_eval() function stored in a udf, but when I run this as a udf on the column of interest, it still returns a string instead of a MapType object. Any thoughts on the best way to go about this problem?

mck · Accepted Answer

You need to specify the return type as map if you want to use literal_eval:

from ast import literal_eval
import pyspark.sql.functions as F

df2 = df.withColumn('col', F.udf(literal_eval, 'map')('col'))

df2.show(truncate=False)
+-----------------------------------------------------------+
|col                                                        |
+-----------------------------------------------------------+
|[1 -> Hello, 2 -> Hi, 3 -> Hola]                           |
|[1 -> Dogs, 2 -> Dogs, Cats, and Fish, 3 -> Fish & Turtles]|
|[1 -> Pizza]                                               |
+-----------------------------------------------------------+

Split PySpark Map-like string into Map Object

Answers (1)

Related Questions