Reputation: 747
I want to add a new map type column to a dataframe, like this:
|-- cMap: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
I tried the code:
df.withColumn("cMap", lit(null).cast(MapType)).printSchema
The error is:
:132: error: overloaded method value cast with alternatives:
(to: String)org.apache.spark.sql.Column
(to: org.apache.spark.sql.types.DataType)org.apache.spark.sql.Column cannot be applied to (org.apache.spark.sql.types.MapType.type)
Is there another way to cast the new column to Map or MapType?
Upvotes: 10
Views: 6888
Reputation: 24356
map().cast("map<string,string>")
To me, lit(null).cast(...)
was not an option, because map_concat
with non-empty map returned null... while the above worked well.
Upvotes: 0
Reputation: 123
I had the same problem, finally I found solution:
df.withColumn("cMap", typedLit(Map.empty[String, String]))
From ScalaDocs for typedLit
:
The difference between this function and [[lit]] is that this function can handle parameterized scala types e.g.: List, Seq and Map.
Upvotes: 5
Reputation: 74619
You can be as much Scala as in the other answer(s) or use a little trick with stringified types.
val withMapCol = df.withColumn("cMap", lit(null) cast "map<string, string>")
scala> withMapCol.printSchema
root
|-- id: long (nullable = false)
|-- cMap: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
You can use any type that Spark SQL supports this way (that you can see in the code here):
dataType
: complex=ARRAY '<' dataType '>' #complexDataType
| complex=MAP '<' dataType ',' dataType '>' #complexDataType
| complex=STRUCT ('<' complexColTypeList? '>' | NEQ) #complexDataType
| identifier ('(' INTEGER_VALUE (',' INTEGER_VALUE)* ')')? #primitiveDataType
Upvotes: 1
Reputation: 37822
Unlike other types, MapType
isn't an object you can just use as-is (it's not an object extending DataType
), you have to call MapType.apply(...)
which expects the key and value types as arguments (and returns an instance of the MapType
class):
df.withColumn("cMap", lit(null).cast(MapType(StringType, StringType)))
Upvotes: 3