Reputation: 13
I have a column in PySpark containing dictionary/map-like values that are stored as strings.
Example Values:
'{1:'Hello', 2:'Hi', 3:'Hola'}'
'{1:'Dogs', 2:'Dogs, Cats, and Fish', 3:'Fish & Turtles'}'
'{1:'Pizza'}'
I'd like to convert these strings into either an array or map, so I can then use the .explode()
function on them to create a row for each dict key-item pair. I would use .split()
on each comma, but since some values have commas in them, this does not work.
I was using the ast.literal_eval()
function stored in a udf, but when I run this as a udf on the column of interest, it still returns a string instead of a MapType object. Any thoughts on the best way to go about this problem?
Upvotes: 1
Views: 818
Reputation: 42392
You need to specify the return type as map if you want to use literal_eval
:
from ast import literal_eval
import pyspark.sql.functions as F
df2 = df.withColumn('col', F.udf(literal_eval, 'map<int,string>')('col'))
df2.show(truncate=False)
+-----------------------------------------------------------+
|col |
+-----------------------------------------------------------+
|[1 -> Hello, 2 -> Hi, 3 -> Hola] |
|[1 -> Dogs, 2 -> Dogs, Cats, and Fish, 3 -> Fish & Turtles]|
|[1 -> Pizza] |
+-----------------------------------------------------------+
Upvotes: 1