Rahul Diggi
Rahul Diggi

Reputation: 368

How to add empty map<string,string> type column to DataFrame in PySpark?

I tried below code but its not working:

df=df.withColumn("cars", typedLit(Map.empty[String, String]))

Gives the error: NameError: name 'typedLit' is not defined

Upvotes: 2

Views: 1367

Answers (2)

mazaneicha
mazaneicha

Reputation: 9417

Perhaps you can use pyspark.sql.functions.expr:

>>> from pyspark.sql.functions import *
>>> df.withColumn("cars",expr("map()")).printSchema()                                                                                                       
root
 |-- col1: string (nullable = true)
 |-- cars: map (nullable = false)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = false)

EDIT:

If you'd like your map to have keys and/or values of a non-trivial type (not map<string,string> as your question's title says), some casting becomes unavoidable, I'm afraid. For example:

>>> df.withColumn("cars",create_map(lit(None).cast(IntegerType()),lit(None).cast(DoubleType()))).printSchema()                                      
root
 |-- col1: string (nullable = true)
 |-- cars: map (nullable = false)
 |    |-- key: integer
 |    |-- value: double (valueContainsNull = true)

...in addition to other options suggested by @blackbishop and @Steven. And just beware of the consequences :) -- maps can't have null keys!

Upvotes: 1

Steven
Steven

Reputation: 15258

Create an empty column and cast it to the type you need.

from pyspark.sql import functions as F, types as T

df = df.withColumn("cars", F.lit(None).cast(T.MapType(T.StringType(), T.StringType())))
df.select("cars").printSchema()
root
 |-- cars: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

Upvotes: 2

Related Questions