S.K
S.K

Reputation: 367

Spark - Applying filter/map on a Dataframe using column names not working

Sorry if this is a duplicate, however, the solutions pointed out aren't working for me. Most likely I am missing something basic here. I have a Dataframe like below:

inputDF: org.apache.spark.sql.DataFrame = [ts: string, id: string ... 20 more fields]

I am trying to filter some of the "rows" of interest here based on a field called "state" (of type String) by doing this (in Scala):

inputDF.filter(inputDF("state") == "BALANCED").show()

However this gives me an error:

<console>:143: error: overloaded method value filter with alternatives:
  (func: org.apache.spark.api.java.function.FilterFunction[org.apache.spark.sql.Row])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] <and>
  (func: org.apache.spark.sql.Row => Boolean)org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] <and>
  (conditionExpr: String)org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] <and>
  (condition: org.apache.spark.sql.Column)org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
 cannot be applied to (Boolean)
       inputDF.filter(inputDF("connState") == "BALANCED").show()

Can someone please point out what is incorrect here ? I followed few examples including the one in https://rklicksolutions.wordpress.com/2016/03/03/tutorial-spark-1-6-sql-and-dataframe-operations/ but can't figure out whats wrong.

Upvotes: 1

Views: 3453

Answers (1)

S.K
S.K

Reputation: 367

Looks like I need to use === instead of ==

inputDF.filter(inputDF("state") === "BALANCED").show()

is doing what I want.

Upvotes: 2

Related Questions