Reputation: 735
I am trying to make sure that a particular column in a dataframe does not contain any illegal values (non- numerical data). For this purpose I am trying to use a regex matching using rlike
to collect illegal values in the data:
I need to collect the values with string characters or spaces or commas or any other characters that are not like numbers. I tried:
spark.sql("select * from tabl where UPC not rlike '[0-9]*'").show()
but this doesn't work. it produces 0 rows.
Any help is appreciated. Thank you.
Upvotes: 9
Views: 22174
Reputation: 8044
rlike
is looking for any match within the string. The asterisk (*) means 0 or many.
Having zero numbers somewhere in a string applies to every possible string.
You need to specify that you want to match from beginning ^
til the end of string $
spark.sql("select * from tabl where UPC not rlike '^[0-9]*$'").show()
alternatively you can also match for any single non numeric character within the string [^0-9]
spark.sql("select * from tabl where UPC rlike '[^0-9]'").show()
Upvotes: 12