Reputation: 1009
I have the following function:
val Reg52 = """(?<!\S)(?!(?:[\d.]*\d){6})[0-9]{1,5}(?:\.[0-9]{1,2})?(?!\S)""".r
def verif_rg52 = udf((s: String) =>
s match {
case null => 0
case Reg52(item, _*) => 0
case _ => 1
})
It should verify if a dataframe column contains numbers with format (5,2) -> maximum of 5 digits with at most 2 after the dot. I tested the regex and it works.
But when I try it in Scala:
val df1 = Seq(
"22.0",
"1000.22"
).toDF("id")
df1.withColumn("r", when(verif_rg52(col("id")) === 0 , "0").otherwise("1")).show(false)
I get
+-------+---+
|id |r |
+-------+---+
|22.0 |1 |
|1000.22|1 |
+-------+---+
But I should get 0 when id=22.0 because it respects regex. Any help? Thank you
Upvotes: 2
Views: 90
Reputation: 51271
Look at your regex. All the leading parens, (
, are followed by a question mark, ?
. You have no capture groups so case Reg52(item, _*)
doesn't match your pattern.
Use case Reg52()
or add more parens to specify the capture groups you want.
Upvotes: 3