user15803585
user15803585

Reputation: 13

Split PySpark Map-like string into Map Object

I have a column in PySpark containing dictionary/map-like values that are stored as strings.

Example Values:

'{1:'Hello', 2:'Hi', 3:'Hola'}'
'{1:'Dogs', 2:'Dogs, Cats, and Fish', 3:'Fish & Turtles'}'
'{1:'Pizza'}'

I'd like to convert these strings into either an array or map, so I can then use the .explode() function on them to create a row for each dict key-item pair. I would use .split() on each comma, but since some values have commas in them, this does not work.

I was using the ast.literal_eval() function stored in a udf, but when I run this as a udf on the column of interest, it still returns a string instead of a MapType object. Any thoughts on the best way to go about this problem?

Upvotes: 1

Views: 818

Answers (1)

mck
mck

Reputation: 42392

You need to specify the return type as map if you want to use literal_eval:

from ast import literal_eval
import pyspark.sql.functions as F

df2 = df.withColumn('col', F.udf(literal_eval, 'map<int,string>')('col'))

df2.show(truncate=False)
+-----------------------------------------------------------+
|col                                                        |
+-----------------------------------------------------------+
|[1 -> Hello, 2 -> Hi, 3 -> Hola]                           |
|[1 -> Dogs, 2 -> Dogs, Cats, and Fish, 3 -> Fish & Turtles]|
|[1 -> Pizza]                                               |
+-----------------------------------------------------------+

Upvotes: 1

Related Questions