Reputation: 9169
I'm trying to just do a simple string filter with the dataset api using startsWith
but I can't get the below statement to work. I can use contains
like this. Not sure what I'm missing here.
ds.filter(_.colToFilter.toString.contains("0")).show(false)
But this just produces an empty dataset but I know that the string is there in the value.
ds.filter(_.colToFilter.toString.startsWith("0")).show(false)
Upvotes: 0
Views: 3125
Reputation: 4132
Try the following:
val d = ds.filter($"columnToFilter".contains("0"))
or
val d = ds.filter($"columnToFilter".startsWith("0"))
Example
+----+-------+
| age| name|
+----+-------+
|null|Michael|
| 30| Andy|
| 19| Justin|
+----+-------+
Assume we have the above dataset
, the output will be:
> var d = ds.filter($"name".contains("n"))
+---+------+
|age| name|
+---+------+
| 30| Andy|
| 19|Justin|
+---+------+
> var d = ds.filter($"name".startsWith("A"))
+---+----+
|age|name|
+---+----+
| 30|Andy|
+---+----+
Upvotes: 1
Reputation: 41987
You can use subString
inbuilt function as
import org.apache.spark.sql.functions._
df.filter(substring(col("column_name-to-be_used"), 0, 1) === "0")
from pyspark.sql import functions as f
df.filter(f.substring(f.col("column_name-to-be_used"), 0, 1) == "0")
So you can substring to as many characters you want to check in starts-with
Upvotes: 1