Reputation: 143
I was to get a value from a map from column value as key and create a new column
I have tried the following
val testMap = Map("abc" -> "1234", "xyz" -> "3456")
def checkthemap (testmap: Map[String, String], key: String) : String = {
val value = testmap.get(key)
if (value == null) "" else value.toString
}
val testDF = analysis
.withColumn("test", lit(checkthemap(testMap,$"col")))
Method accepts the string not the column. How do I change my withColumn statement to send column value as string to a method.
Upvotes: 8
Views: 26104
Reputation: 1129
Or even simpler from spark 2.4.x
List(Map("a" -> "b")).toDF("map").select("map.a").show()
Upvotes: 1
Reputation: 51
I had some issues with the previous answer, since I can not add in a column with element_at. Try (spark 3+)
df.withColumn(soureColName, map_values(map_filter(typedLit(testMap),(k, _) => k === col("id")))(0))
Upvotes: 1
Reputation: 11
I think you can use the built-in function element_at.
Its definition is: Returns element of array at given index in value if column is array. Returns value for the given key in value if column is map.
import org.apache.spark.sql.functions.{element_at, col, typedLit}
df.withColumn("value", element_at(typedLit(testMap), col("colName")))
Upvotes: -1
Reputation: 22449
I would suggest using a UDF (user defined function) that takes the column as the key
for the passed-in lookup Map to return the corresponding Map value, as shown below:
import org.apache.spark.sql.functions._
import spark.implicits._
def getMapValue(m: Map[String, String], defaultValue: String) = udf{
(key: String) => m.getOrElse(key, defaultValue)
}
val df = Seq(
(1, "a"), (2, "b"), (3, "c")
).toDF("id", "key")
val lookupMap = Map("a" -> "xx", "c" -> "zz")
df.withColumn("value", getMapValue(lookupMap, "")($"key")).show
// +---+---+-----+
// | id|key|value|
// +---+---+-----+
// | 1| a| xx|
// | 2| b| |
// | 3| c| zz|
// +---+---+-----+
EDIT: See the other answer for a solution using built-in Spark functions which generally perform better than UDFs.
Upvotes: 2
Reputation: 181
The accepted answer is very inefficient and unnecessarily complicated. Instead you should just treat testMap
as a literal. Import the following
import org.apache.spark.sql.functions.{coalesce, lit, typedLit}
convert map to column:
val testMapCol = typedLit(testMap)
and just select:
df.withColumn("value", coalesce(testMapCol($"col"), lit(""))
Upvotes: 18