Reputation: 25812
In Apache Spark API I can use startsWith
function in order to test the value of the column:
myDataFrame.filter(col("columnName").startsWith("PREFIX"))
Is it possible to do the same in Spark SQL expression and if so, could you please show an example?.
Upvotes: 6
Views: 12241
Reputation: 71
Spark 3.3.0 added the startswith function.
[SPARK-37520][SQL] Add the startswith()
and endswith()
string functions
https://issues.apache.org/jira/browse/SPARK-37520
https://spark.apache.org/docs/3.3.0/api/sql/index.html#startswith
Usage
> SELECT startswith('Spark SQL', 'Spark');
true
> SELECT startswith('Spark SQL', 'SQL');
false
> SELECT startswith('Spark SQL', null);
NULL
> SELECT startswith(x'537061726b2053514c', x'537061726b');
true
> SELECT startswith(x'537061726b2053514c', x'53514c');
false
Upvotes: 1
Reputation: 8711
With regexp_replace, you can get the same results without UDFs. Check this out
scala> val df=Seq(("bPREFIX",1),("PREFIXb",2)).toDF("a","b")
df: org.apache.spark.sql.DataFrame = [a: string, b: int]
scala> df.show
+-------+---+
| a| b|
+-------+---+
|bPREFIX| 1|
|PREFIXb| 2|
+-------+---+
scala> df.filter(regexp_replace('a,"""^PREFIX.*""","")==="").show
+-------+---+
| a| b|
+-------+---+
|PREFIXb| 2|
+-------+---+
scala>
or using regexp_extract()
scala> df.filter(regexp_extract('a,"""(^PREFIX)""",1)==="PREFIX").show
+-------+---+
| a| b|
+-------+---+
|PREFIXb| 2|
+-------+---+
scala>
Using instr() function
scala> df.filter("instr(a,'PREFIX')=1").show
+-------+---+
| a| b|
+-------+---+
|PREFIXb| 2|
+-------+---+
scala>
Upvotes: 7
Reputation: 25812
I found the following solution that works with Spark SQL query without custom UDP and out of the box, for example:
CASE WHEN instr(PRICE_SOURCE_INDICATOR,'MAC') = 1
Upvotes: 4
Reputation: 7928
You can create an UDF to do that
import org.apache.spark.sql.functions
val startsWith = udf((columnValue: String) => columnValue.startsWith("PREFIX"))
The UDF will receive the column and check it against the PREFIX, then you can use it as follows:
myDataFrame.filter(startsWith($"columnName"))
If you want a parameter as prefix you can with lit
val startsWith = udf((columnValue: String, prefix:String) => columnValue.startsWith(prefix))
myDataFrame.filter(startsWith($"columnName", lit("PREFIX")))
INPUT EXAMPLE
+-----------+------+
|letter |number|
+-----------+------+
| PREFIXb| 1|
| bPREFIX| 2|
+-----------+------+
OUTPUT EXAMPLE
myDataFrame.filter(startsWith($"letter")).show
+-----------+------+
|letter |number|
+-----------+------+
| PREFIXb| 1|
+-----------+------+
Upvotes: 1