Lou_Ds
Lou_Ds

Reputation: 551

Scala Spark - empty map on DataFrame column for map(String, Int)

I am joining two DataFrames, where there are columns of a type Map[String, Int]

I want the merged DF to have an empty map [] and not null on the Map type columns.

val df = dfmerged.
  .select("id"),
          coalesce(col("map_1"), lit(null).cast(MapType(StringType, IntType))).alias("map_1"),
          coalesce(col("map_2"), lit(Map.empty[String, Int])).alias("map_2")

for a map_1 column, a null will be inserted, but I'd like to have an empty map map_2 is giving me an error:

java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.Map$EmptyMap$ Map()

I've also tried with an udf function like:

case class myStructMap(x:Map[String, Int])
val emptyMap = udf(() => myStructMap(Map.empty[String, Int]))

also did not work.

when I try something like:

.select( coalesce(col("myMapCol"), lit(map())).alias("brand_viewed_count")...

or

.select(coalesce(col("myMapCol"), lit(map().cast(MapType(LongType, LongType)))).alias("brand_viewed_count")...

I get the error:

cannot resolve 'map()' due to data type mismatch: cannot cast MapType(NullType,NullType,false) to MapType(LongType,IntType,true);

Upvotes: 4

Views: 5643

Answers (1)

Alper t. Turker
Alper t. Turker

Reputation: 35249

In Spark 2.2

import org.apache.spark.sql.functions.typedLit

val df = Seq((1L, null), (2L, Map("foo" -> "bar"))).toDF("id", "map")

df.withColumn("map", coalesce($"map", typedLit(Map[String, Int]()))).show
// +---+-----------------+
// | id|              map|
// +---+-----------------+
// |  1|            Map()|
// |  2|Map(foobar -> 42)|
// +---+-----------------+

Before

df.withColumn("map", coalesce($"map", map().cast("map<string,int>"))).show
// +---+-----------------+
// | id|              map|
// +---+-----------------+
// |  1|            Map()|
// |  2|Map(foobar -> 42)|
// +---+-----------------+

Upvotes: 10

Related Questions