Reputation: 33
I'm looking to apply the 'rlike' function on a column, however instead of a standard regular expression string I want to be able to input a column (which is a regular expression string).
ie. $col1.rlike($col2)
where $col2 is in a regular expression format within the dataframe
I have tried applying a UDF:
def rLike = udf((s: String, col: Column) => col.rlike(s))
This keeps giving me the error:
java.lang.UnsupportedOperationException: Schema for type org.apache.spark.sql.Column is not supported
Could anyone please shed some light on how I can fix this.
Upvotes: 3
Views: 1019
Reputation: 43
The original question was answered in a comment: the UDF needs to be implemented with the signature (s: String, col: String): Boolean
For anyone else who stumbled upon this question -- without using a UDF, the only semi-clean way I could find to do this as of now is to use expr
function.
df.where(expr("col1 RLIKE col2"))
I found one Jira issue(SPARK-11366) requesting functions such as rlike
to accept other columns, but it seems to have been closed without updates.
Upvotes: 0
Reputation: 22449
Method rlike
does not support regex-matching pattern stored in a Column
. An alternative is to use regexp_replace
as shown below:
import org.apache.spark.sql.functions._
import spark.implicits._
val df = Seq(
("a123", "[a-z]\\d+"),
("b456", "[a-z]+")
).toDF("text", "pattern")
val matched = "Matched!" // can be any value non-existent in column `text`
df.where(regexp_replace($"text", $"pattern", lit(matched)) === matched).show
// +----+--------+
// |text| pattern|
// +----+--------+
// |a123|[a-z]\d+|
// +----+--------+
In case you would like to implement a custom rlike
as an UDF (which generally doesn't scale well compared with native Spark API functions), here's one way:
def rlike = udf( (text: String, pattern: String) => text match {
case pattern.r() => true
case _ => false
} )
// Applying the UDF
df.where(rlike($"text", $"pattern"))
Upvotes: 1