KishoreKumar
KishoreKumar

Reputation: 143

Get value from a map for a column value as a key in spark dataframes

I was to get a value from a map from column value as key and create a new column

I have tried the following

val testMap = Map("abc" -> "1234", "xyz" -> "3456")

def checkthemap (testmap: Map[String, String], key: String) : String = {
    val value = testmap.get(key)
    if (value == null) "" else value.toString
}

val testDF = analysis
    .withColumn("test", lit(checkthemap(testMap,$"col")))

Method accepts the string not the column. How do I change my withColumn statement to send column value as string to a method.

Upvotes: 8

Views: 26104

Answers (5)

Grigoriev Nick
Grigoriev Nick

Reputation: 1129

Or even simpler from spark 2.4.x

List(Map("a" -> "b")).toDF("map").select("map.a").show()

Upvotes: 1

Hans Adriaans
Hans Adriaans

Reputation: 51

I had some issues with the previous answer, since I can not add in a column with element_at. Try (spark 3+)

df.withColumn(soureColName, map_values(map_filter(typedLit(testMap),(k, _) => k === col("id")))(0))

Upvotes: 1

Chongguang
Chongguang

Reputation: 11

I think you can use the built-in function element_at.

Its definition is: Returns element of array at given index in value if column is array. Returns value for the given key in value if column is map.

import org.apache.spark.sql.functions.{element_at, col, typedLit}
df.withColumn("value", element_at(typedLit(testMap), col("colName")))

Upvotes: -1

Leo C
Leo C

Reputation: 22449

I would suggest using a UDF (user defined function) that takes the column as the key for the passed-in lookup Map to return the corresponding Map value, as shown below:

import org.apache.spark.sql.functions._
import spark.implicits._

def getMapValue(m: Map[String, String], defaultValue: String) = udf{
  (key: String) => m.getOrElse(key, defaultValue)
}

val df = Seq(
  (1, "a"), (2, "b"), (3, "c")
).toDF("id", "key")

val lookupMap = Map("a" -> "xx", "c" -> "zz")

df.withColumn("value", getMapValue(lookupMap, "")($"key")).show
// +---+---+-----+
// | id|key|value|
// +---+---+-----+
// |  1|  a|   xx|
// |  2|  b|     |
// |  3|  c|   zz|
// +---+---+-----+

EDIT: See the other answer for a solution using built-in Spark functions which generally perform better than UDFs.

Upvotes: 2

user11425401
user11425401

Reputation: 181

The accepted answer is very inefficient and unnecessarily complicated. Instead you should just treat testMap as a literal. Import the following

import org.apache.spark.sql.functions.{coalesce, lit, typedLit}

convert map to column:

val testMapCol = typedLit(testMap)

and just select:

df.withColumn("value", coalesce(testMapCol($"col"), lit(""))

Upvotes: 18

Related Questions