Hemanth
Hemanth

Reputation: 735

Spark SQL - Regex for matching only numbers

I am trying to make sure that a particular column in a dataframe does not contain any illegal values (non- numerical data). For this purpose I am trying to use a regex matching using rlike to collect illegal values in the data:

enter image description here

I need to collect the values with string characters or spaces or commas or any other characters that are not like numbers. I tried:

spark.sql("select * from tabl where UPC not rlike '[0-9]*'").show()

but this doesn't work. it produces 0 rows.

Any help is appreciated. Thank you.

Upvotes: 9

Views: 22174

Answers (1)

dre-hh
dre-hh

Reputation: 8044

rlike is looking for any match within the string. The asterisk (*) means 0 or many. Having zero numbers somewhere in a string applies to every possible string. You need to specify that you want to match from beginning ^ til the end of string $

spark.sql("select * from tabl where UPC not rlike '^[0-9]*$'").show()

alternatively you can also match for any single non numeric character within the string [^0-9]

spark.sql("select * from tabl where UPC rlike '[^0-9]'").show()

Upvotes: 12

Related Questions