Edamame
Edamame

Reputation: 25366

DataFrame error: "overloaded method value filter with alternatives"

I am trying to create a new data frame by filter out the rows which is null or empty string using the code below:

val df1 = df.filter(df("fieldA") != "").cache()

Then I got the following error:

 <console>:32: error: overloaded method value filter with alternatives:
      (conditionExpr: String)org.apache.spark.sql.DataFrame <and>
      (condition: org.apache.spark.sql.Column)org.apache.spark.sql.DataFrame
     cannot be applied to (Boolean)
                  val df1 = df.filter(df("fieldA") != "").cache()
                                 ^

Does anyone know what I missed here? Thanks!

Upvotes: 29

Views: 44690

Answers (1)

Daniel de Paula
Daniel de Paula

Reputation: 17862

In Scala, in order to compare equality column-wise, you should use === and !== (or =!= in Spark 2.0+):

val df1 = df.filter(df("fieldA") !== "").cache()

Alternatively, you can use an expression:

val df1 = df.filter("fieldA != ''").cache()

Your error happened because the != operator is present in every Scala object and it's used to compare objects, always returning Boolean. However, the filter function expects a Column object or an expression in a String, so there is the !== operator in the Column class, which returns another Column and then can be used in the way you want.

To see all operations available for columns, the Column scaladoc is very useful. Also, there is the functions package.

Upvotes: 44

Related Questions