Reputation: 354
I'm able to create a new Dataframe with one column having Map datatype.
val inputDF2 = Seq(
(1, "Visa", 1, Map[String, Int]()),
(2, "MC", 2, Map[String, Int]())).toDF("id", "card_type", "number_of_cards", "card_type_details")
scala> inputDF2.show(false)
+---+---------+---------------+-----------------+
|id |card_type|number_of_cards|card_type_details|
+---+---------+---------------+-----------------+
|1 |Visa |1 |[] |
|2 |MC |2 |[] |
+---+---------+---------------+-----------------+
Now I want to create a new column of the same type as card_type_details. I'm trying to use the spark withColumn method to add this new column.
inputDF2.withColumn("tmp", lit(null) cast "map<String, Int>").show(false)
+---------+---------+---------------+---------------------+-----+
|person_id|card_type|number_of_cards|card_type_details |tmp |
+---------+---------+---------------+---------------------+-----+
|1 |Visa |1 |[] |null |
|2 |MC |2 |[] |null |
+---------+---------+---------------+---------------------+-----+
When I checked the schema of both the columns, it is same but values are coming different.
scala> inputDF2.withColumn("tmp", lit(null) cast "map<String, Int>").printSchema
root
|-- id: integer (nullable = false)
|-- card_type: string (nullable = true)
|-- number_of_cards: integer (nullable = false)
|-- card_type_details: map (nullable = true)
| |-- key: string
| |-- value: integer (valueContainsNull = false)
|-- tmp: map (nullable = true)
| |-- key: string
| |-- value: integer (valueContainsNull = true)
I'm not sure if I'm doing correctly while adding the new column. Issue is coming when I'm applying the .isEmpty method on the tmp column. I'm getting null pointer exception.
scala> def checkValue = udf((card_type_details: Map[String, Int]) => {
| var output_map = Map[String, Int]()
| if (card_type_details.isEmpty) { output_map += 0.toString -> 1 }
| else {output_map = card_type_details }
| output_map
| })
checkValue: org.apache.spark.sql.expressions.UserDefinedFunction
scala> inputDF2.withColumn("value", checkValue(col("card_type_details"))).show(false)
+---+---------+---------------+-----------------+--------+
|id |card_type|number_of_cards|card_type_details|value |
+---+---------+---------------+-----------------+--------+
|1 |Visa |1 |[] |[0 -> 1]|
|2 |MC |2 |[] |[0 -> 1]|
+---+---------+---------------+-----------------+--------+
scala> inputDF2.withColumn("tmp", lit(null) cast "map<String, Int>")
.withColumn("value", checkValue(col("tmp"))).show(false)
org.apache.spark.SparkException: Failed to execute user defined function($anonfun$checkValue$1: (map<string,int>) => map<string,int>)
Caused by: java.lang.NullPointerException
at $anonfun$checkValue$1.apply(<console>:28)
at $anonfun$checkValue$1.apply(<console>:26)
at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:108)
at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:107)
at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1063)
How to add a new column that should have the same values as card_type_details column.
Upvotes: 0
Views: 1488
Reputation: 5078
To add the tmp
column with the same value as card_type_details, you just do:
inputDF2.withColumn("tmp", col("cart_type_details"))
If you aim to add a column with an empty map and avoid the NullPointerException
, the solution is:
inputDF2.withColumn("tmp", typedLit(Map.empty[Int, String]))
Upvotes: 1