Reputation: 4375
How do I filter column with particular value?
This works fine >
scala> dataframe.filter("postalCode > 900").count()
but ==
fails
scala> dataframe.filter("postalCode == 900").count()
java.lang.RuntimeException: [1.13] failure: identifier expected
postalCode == 900 ##Error line
I know I am missing something obvious but I cant figure out. I checked API doc and SO for same. Also, tried giving ===
Upvotes: 0
Views: 7564
Reputation: 21
You could use "==="
operator with filter/where as below. basically where
is alias of filter
.
using the same example by zero323.
val df = sc.parallelize(Seq(("foo", 900), ("bar", 100))).toDF("k", "postalCode")
df.where($"postalCode" === 900).show
+---+----------+
| k|postalCode|
+---+----------+
|foo| 900|
+---+----------+
df.filter($"postalCode" === 900).show
+---+----------+
| k|postalCode|
+---+----------+
|foo| 900|
+---+----------+
df.filter(df("postalCode") === 900).show
+---+----------+
| k|postalCode|
+---+----------+
|foo| 900|
+---+----------+
Upvotes: 0
Reputation: 18022
In python
it may be approached this way (using @zero323 data):
df = sqlContext.createDataFrame(sc.parallelize(
[("foo", 900), ("bar", 100)]),
StructType([
StructField("k", StringType(), True),
StructField("v", IntegerType(), True)
])
)
filtered_df = df.where(df.v == 900)
filtered_df.show()
Upvotes: 2
Reputation: 330063
Expression string you pass to filter
/ where
should be a valid SQL expression. It means you have to use a single equal operator:
dataframe.filter("postalCode = 900")
And example
val df = sc.parallelize(Seq(("foo", 900), ("bar", 100))).toDF("k", "postalCode")
df.where("postalCode = 900").show
// +---+----------+
// | k|postalCode|
// +---+----------+
// |foo| 900|
// +---+----------+
Upvotes: 1