WoodChopper
WoodChopper

Reputation: 4375

Spark dataframe checking equality and filtering

How do I filter column with particular value?

This works fine >

scala> dataframe.filter("postalCode > 900").count()

but == fails

scala> dataframe.filter("postalCode == 900").count()
java.lang.RuntimeException: [1.13] failure: identifier expected

postalCode == 900 ##Error line

I know I am missing something obvious but I cant figure out. I checked API doc and SO for same. Also, tried giving ===

Upvotes: 0

Views: 7564

Answers (3)

Rajesh Dommati
Rajesh Dommati

Reputation: 21

You could use "===" operator with filter/where as below. basically where is alias of filter.

using the same example by zero323.

val df = sc.parallelize(Seq(("foo", 900), ("bar", 100))).toDF("k", "postalCode")

df.where($"postalCode" === 900).show +---+----------+ | k|postalCode| +---+----------+ |foo| 900| +---+----------+

df.filter($"postalCode" === 900).show +---+----------+ | k|postalCode| +---+----------+ |foo| 900| +---+----------+

df.filter(df("postalCode") === 900).show +---+----------+ | k|postalCode| +---+----------+ |foo| 900| +---+----------+

Upvotes: 0

Alberto Bonsanto
Alberto Bonsanto

Reputation: 18022

In python it may be approached this way (using @zero323 data):

df = sqlContext.createDataFrame(sc.parallelize(
    [("foo", 900), ("bar", 100)]), 
    StructType([
        StructField("k", StringType(), True), 
        StructField("v", IntegerType(), True)
    ])
)

filtered_df = df.where(df.v == 900)
filtered_df.show()

Upvotes: 2

zero323
zero323

Reputation: 330063

Expression string you pass to filter / where should be a valid SQL expression. It means you have to use a single equal operator:

dataframe.filter("postalCode = 900")

And example

val df = sc.parallelize(Seq(("foo", 900), ("bar", 100))).toDF("k", "postalCode")
df.where("postalCode = 900").show

// +---+----------+
// |  k|postalCode|
// +---+----------+
// |foo|       900|
// +---+----------+

Upvotes: 1

Related Questions