Reputation: 249
If I have a DataFrame called df that looks like:
+---+---+
| a1+ a2|
+---+---+
|foo|bar|
|N/A|baz|
+---+---+
I would expect from:
val df2 = df.withColumn("a1", when($"a1" == "N/A", $"a2))
that df2 would look like:
+---+---+
| a1+ a2|
+---+---+
|foo|bar|
|baz|baz|
+---+---+
but instead I get:
error: type mismatch;
found : Boolean
required: org.apache.spark.sql.Column
So it sounds like I need a method of Column that produces its value within a DataFrame's withColumn method.
Any such thing, or other approach to conditionally populate the replacement parameter of withColumn by the current column's value?
Upvotes: 2
Views: 5365
Reputation: 5572
You need to use ===
not ==
:
scala> val df = Seq(("foo", "bar"), ("N/A", "baz")).toDF("a1", "a2")
df: org.apache.spark.sql.DataFrame = [a1: string, a2: string]
scala> df.show
+---+---+
| a1| a2|
+---+---+
|foo|bar|
|N/A|baz|
+---+---+
scala> df.withColumn("a1", when($"a1" === "N/A", $"a2").otherwise($"a1")).show
+---+---+
| a1| a2|
+---+---+
|foo|bar|
|baz|baz|
+---+---+
Upvotes: 3
Reputation: 215047
You need ===
instead of ==
:
val df2 = df.withColumn("a1", when($"a1" === "N/A", $"a2").otherwise($"a1"))
// df2: org.apache.spark.sql.DataFrame = [a1: string, a2: string]
df2.show
+---+---+
| a1| a2|
+---+---+
|foo|bar|
|baz|baz|
+---+---+
Upvotes: 7