Justin Pihony
Justin Pihony

Reputation: 67135

Create a new column using a map

Is there a way (not using a UDF) to take an existing dataframe and create a new column by taking an existing column and pulling out it's equivalent value from a map?

df.withColumn("newCol", transform(col("existing").using(map)))

where map's key type is the same as the existing, with the value being the output I want.

Upvotes: 2

Views: 2038

Answers (2)

Arnon Rotem-Gal-Oz
Arnon Rotem-Gal-Oz

Reputation: 25929

import sqtx.implicits._
val x = Map("foo" -> 1,"bar"-> 2, "baz"->3)

val df = sc.parallelize(Seq(
  (1, "foo"), (2, "bar"), (3, "foobar")
)).toDF("id", "existing")
df.map(r => (r.getInt(0),x.getOrElse(r.getString(1),0))).toDF("id","new")

Upvotes: 0

zero323
zero323

Reputation: 330393

You can convert Map to a DataFrame and join:

val df = sc.parallelize(Seq(
    (1, "foo"), (2, "bar"), (3, "foobar")
)).toDF("id", "existing")

val map = Map("foo" -> 1, "bar" -> 2)
val lookup = sc.parallelize(map.toSeq).toDF("key", "value")

df
 .join(lookup, $"existing" <=> $"key", "left")
 .drop("key")
 .withColumnRenamed("value", "newCol")

Upvotes: 2

Related Questions