Pingjiang Li
Pingjiang Li

Reputation: 747

How to add an empty map type column to DataFrame?

I want to add a new map type column to a dataframe, like this:

|-- cMap: map (nullable = true)
|    |-- key: string
|    |-- value: string (valueContainsNull = true)

I tried the code:

df.withColumn("cMap", lit(null).cast(MapType)).printSchema

The error is:

:132: error: overloaded method value cast with alternatives:
(to: String)org.apache.spark.sql.Column
(to: org.apache.spark.sql.types.DataType)org.apache.spark.sql.Column cannot be applied to (org.apache.spark.sql.types.MapType.type)

Is there another way to cast the new column to Map or MapType?

Upvotes: 10

Views: 6888

Answers (4)

ZygD
ZygD

Reputation: 24356

map().cast("map<string,string>")

To me, lit(null).cast(...) was not an option, because map_concat with non-empty map returned null... while the above worked well.

Upvotes: 0

bartholomaios
bartholomaios

Reputation: 123

I had the same problem, finally I found solution:

df.withColumn("cMap", typedLit(Map.empty[String, String])) 

From ScalaDocs for typedLit:

The difference between this function and [[lit]] is that this function can handle parameterized scala types e.g.: List, Seq and Map.

Upvotes: 5

Jacek Laskowski
Jacek Laskowski

Reputation: 74619

You can be as much Scala as in the other answer(s) or use a little trick with stringified types.

val withMapCol = df.withColumn("cMap", lit(null) cast "map<string, string>")
scala> withMapCol.printSchema
root
 |-- id: long (nullable = false)
 |-- cMap: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

You can use any type that Spark SQL supports this way (that you can see in the code here):

dataType
    : complex=ARRAY '<' dataType '>'                            #complexDataType
    | complex=MAP '<' dataType ',' dataType '>'                 #complexDataType
    | complex=STRUCT ('<' complexColTypeList? '>' | NEQ)        #complexDataType
    | identifier ('(' INTEGER_VALUE (',' INTEGER_VALUE)* ')')?  #primitiveDataType

Upvotes: 1

Tzach Zohar
Tzach Zohar

Reputation: 37822

Unlike other types, MapType isn't an object you can just use as-is (it's not an object extending DataType), you have to call MapType.apply(...) which expects the key and value types as arguments (and returns an instance of the MapType class):

df.withColumn("cMap", lit(null).cast(MapType(StringType, StringType))) 

Upvotes: 3

Related Questions