Haha
Haha

Reputation: 1009

Scala regex not functionning

I have the following function:

val Reg52 = """(?<!\S)(?!(?:[\d.]*\d){6})[0-9]{1,5}(?:\.[0-9]{1,2})?(?!\S)""".r
def verif_rg52 = udf((s: String) =>
    s match {
      case null            => 0
      case Reg52(item, _*) => 0
      case _               => 1
    })

It should verify if a dataframe column contains numbers with format (5,2) -> maximum of 5 digits with at most 2 after the dot. I tested the regex and it works.

But when I try it in Scala:

val df1 = Seq(
  "22.0",
  "1000.22"
  ).toDF("id")

df1.withColumn("r", when(verif_rg52(col("id")) === 0 , "0").otherwise("1")).show(false)

I get

+-------+---+
|id     |r  |
+-------+---+
|22.0   |1  |
|1000.22|1  |
+-------+---+

But I should get 0 when id=22.0 because it respects regex. Any help? Thank you

Upvotes: 2

Views: 90

Answers (1)

jwvh
jwvh

Reputation: 51271

Look at your regex. All the leading parens, (, are followed by a question mark, ?. You have no capture groups so case Reg52(item, _*) doesn't match your pattern.

Use case Reg52() or add more parens to specify the capture groups you want.

Upvotes: 3

Related Questions