Reputation: 105
I'm analysing two columns to filter in records that don't look alike:
select firstname, name
from tableA
where lower(firstname) != lower(name)
I accept names that match even if few special characters don't, for instance:
These two names to Databricks are deemed to be different (rightly so) but I want to escape that '-' charachter. Is there a way to say 'look for different names without considering the '-' character'? Or maybe a way to get a index on how much the two strings differ.
Upvotes: 1
Views: 1032
Reputation: 87249
There are several approaches for that:
-
, multiple spaces, etc. with single space - you can use regexp_replace function for that. Something like this:select firstname, name
from tableA
where
regex_replace(lower(firstname), '[- ]+', ' ') !=
regex_replace(lower(name), '[- ]+', ' ')
It could be further simplified by creating a user-defined SQL function for that normalization.
select firstname, name
from tableA
where levenstein(lower(firstname), lower(name)) != 0
Upvotes: 1