yoel
yoel

Reputation: 249

In DataFrame.withColumn, how can I use the column's value as a condition for the second parameter?

If I have a DataFrame called df that looks like:

+---+---+
| a1+ a2|
+---+---+
|foo|bar|
|N/A|baz|
+---+---+

I would expect from:

val df2 = df.withColumn("a1", when($"a1" == "N/A", $"a2))

that df2 would look like:

+---+---+
| a1+ a2|
+---+---+
|foo|bar|
|baz|baz|
+---+---+

but instead I get:

error: type mismatch;
 found   : Boolean
 required: org.apache.spark.sql.Column

So it sounds like I need a method of Column that produces its value within a DataFrame's withColumn method.

Any such thing, or other approach to conditionally populate the replacement parameter of withColumn by the current column's value?

Upvotes: 2

Views: 5365

Answers (2)

evan.oman
evan.oman

Reputation: 5572

You need to use === not ==:

scala> val df = Seq(("foo", "bar"), ("N/A", "baz")).toDF("a1", "a2")
df: org.apache.spark.sql.DataFrame = [a1: string, a2: string]

scala> df.show
+---+---+
| a1| a2|
+---+---+
|foo|bar|
|N/A|baz|
+---+---+

scala> df.withColumn("a1", when($"a1" === "N/A", $"a2").otherwise($"a1")).show
+---+---+
| a1| a2|
+---+---+
|foo|bar|
|baz|baz|
+---+---+

Upvotes: 3

akuiper
akuiper

Reputation: 215047

You need === instead of ==:

val df2 = df.withColumn("a1", when($"a1" === "N/A", $"a2").otherwise($"a1"))
// df2: org.apache.spark.sql.DataFrame = [a1: string, a2: string]

df2.show
+---+---+
| a1| a2|
+---+---+
|foo|bar|
|baz|baz|
+---+---+

Upvotes: 7

Related Questions