vamsi
vamsi

Reputation: 354

Add new column of Map Datatype to Spark Dataframe in scala

I'm able to create a new Dataframe with one column having Map datatype.

val inputDF2 = Seq(
(1, "Visa", 1, Map[String, Int]()), 
(2, "MC", 2, Map[String, Int]())).toDF("id", "card_type", "number_of_cards", "card_type_details")
scala> inputDF2.show(false)
+---+---------+---------------+-----------------+
|id |card_type|number_of_cards|card_type_details|
+---+---------+---------------+-----------------+
|1  |Visa     |1              |[]               |
|2  |MC       |2              |[]               |
+---+---------+---------------+-----------------+

Now I want to create a new column of the same type as card_type_details. I'm trying to use the spark withColumn method to add this new column.

inputDF2.withColumn("tmp", lit(null) cast "map<String, Int>").show(false)

+---------+---------+---------------+---------------------+-----+
|person_id|card_type|number_of_cards|card_type_details    |tmp  |
+---------+---------+---------------+---------------------+-----+
|1        |Visa     |1              |[]                   |null |
|2        |MC       |2              |[]                   |null |
+---------+---------+---------------+---------------------+-----+ 

When I checked the schema of both the columns, it is same but values are coming different.

scala> inputDF2.withColumn("tmp", lit(null) cast "map<String, Int>").printSchema
root
 |-- id: integer (nullable = false)
 |-- card_type: string (nullable = true)
 |-- number_of_cards: integer (nullable = false)
 |-- card_type_details: map (nullable = true)
 |    |-- key: string
 |    |-- value: integer (valueContainsNull = false)
 |-- tmp: map (nullable = true)
 |    |-- key: string
 |    |-- value: integer (valueContainsNull = true)

I'm not sure if I'm doing correctly while adding the new column. Issue is coming when I'm applying the .isEmpty method on the tmp column. I'm getting null pointer exception.

scala> def checkValue = udf((card_type_details: Map[String, Int]) => {
     | var output_map = Map[String, Int]()
     | if (card_type_details.isEmpty) { output_map += 0.toString -> 1 }
     | else {output_map = card_type_details }
     | output_map
     | })
checkValue: org.apache.spark.sql.expressions.UserDefinedFunction

scala> inputDF2.withColumn("value", checkValue(col("card_type_details"))).show(false)
+---+---------+---------------+-----------------+--------+
|id |card_type|number_of_cards|card_type_details|value   |
+---+---------+---------------+-----------------+--------+
|1  |Visa     |1              |[]               |[0 -> 1]|
|2  |MC       |2              |[]               |[0 -> 1]|
+---+---------+---------------+-----------------+--------+

scala> inputDF2.withColumn("tmp", lit(null) cast "map<String, Int>")
.withColumn("value", checkValue(col("tmp"))).show(false)

org.apache.spark.SparkException: Failed to execute user defined function($anonfun$checkValue$1: (map<string,int>) => map<string,int>)

Caused by: java.lang.NullPointerException
  at $anonfun$checkValue$1.apply(<console>:28)
  at $anonfun$checkValue$1.apply(<console>:26)
  at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:108)
  at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:107)
  at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1063)

How to add a new column that should have the same values as card_type_details column.

Upvotes: 0

Views: 1488

Answers (1)

Vincent Doba
Vincent Doba

Reputation: 5078

To add the tmp column with the same value as card_type_details, you just do:

inputDF2.withColumn("tmp", col("cart_type_details"))

If you aim to add a column with an empty map and avoid the NullPointerException, the solution is:

inputDF2.withColumn("tmp", typedLit(Map.empty[Int, String]))

Upvotes: 1

Related Questions