Reputation: 551
I am joining two DataFrames, where there are columns of a type Map[String, Int]
I want the merged DF to have an empty map []
and not null
on the Map
type columns.
val df = dfmerged.
.select("id"),
coalesce(col("map_1"), lit(null).cast(MapType(StringType, IntType))).alias("map_1"),
coalesce(col("map_2"), lit(Map.empty[String, Int])).alias("map_2")
for a map_1
column, a null
will be inserted, but I'd like to have an empty map
map_2 is giving me an error:
java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.Map$EmptyMap$ Map()
I've also tried with an udf
function like:
case class myStructMap(x:Map[String, Int])
val emptyMap = udf(() => myStructMap(Map.empty[String, Int]))
also did not work.
when I try something like:
.select( coalesce(col("myMapCol"), lit(map())).alias("brand_viewed_count")...
or
.select(coalesce(col("myMapCol"), lit(map().cast(MapType(LongType, LongType)))).alias("brand_viewed_count")...
I get the error:
cannot resolve 'map()' due to data type mismatch: cannot cast MapType(NullType,NullType,false) to MapType(LongType,IntType,true);
Upvotes: 4
Views: 5643
Reputation: 35249
In Spark 2.2
import org.apache.spark.sql.functions.typedLit
val df = Seq((1L, null), (2L, Map("foo" -> "bar"))).toDF("id", "map")
df.withColumn("map", coalesce($"map", typedLit(Map[String, Int]()))).show
// +---+-----------------+
// | id| map|
// +---+-----------------+
// | 1| Map()|
// | 2|Map(foobar -> 42)|
// +---+-----------------+
Before
df.withColumn("map", coalesce($"map", map().cast("map<string,int>"))).show
// +---+-----------------+
// | id| map|
// +---+-----------------+
// | 1| Map()|
// | 2|Map(foobar -> 42)|
// +---+-----------------+
Upvotes: 10