Reputation: 1346
I need to check if an e-mail address matches another with the possible exception of one typo (e.g. "hormail" instead of "hotmail", a letter missing, or an extra letter).
It's a postgresql (version 9.1.4) query, and neither fuzzymatch nor levenshtein is available.
Upvotes: 1
Views: 277
Reputation: 1129
You should use the fuzzystrmatch and/or pg_trgm Postgres extensions. You can enable these by running:
$ psql mydb
mydb=# CREATE EXTENSION fuzzystrmatch;
mydb=# CREATE EXTENSION pg_trgm;
You can then perform queries like these:
mydb=# SELECT email from Users where email % 'hormail' or difference(email, 'hormail') > 2;
Upvotes: 0
Reputation: 324445
Here's a starting point. It's a function that breaks both emails down into (char-pos, char) pairs, then finds pairs that're mismatched between the two inputs.
CREATE OR REPLACE FUNCTION email_diffs(
email1 IN text, email2 IN text,
chnum OUT integer, ch OUT text, fromwhich OUT integer
)
RETURNS SETOF RECORD AS $$
BEGIN
RETURN QUERY
WITH
e1chars AS (
SELECT generate_series(0, length($1) -1 ) AS chpos, regexp_split_to_table($1,'') ch
),
e2chars AS (
SELECT generate_series(0, length($2) - 1) AS chpos, regexp_split_to_table($2,'') ch
),
only_in_e1chars AS (
SELECT * FROM e1chars EXCEPT SELECT * FROM e2chars
),
only_in_e2chars AS (
SELECT * FROM e2chars EXCEPT SELECT * FROM e1chars
),
mismatched_pairs AS (
SELECT *, 1 FROM only_in_e1chars UNION SELECT *, 2 FROM only_in_e2chars
)
SELECT * FROM mismatched_pairs;
END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;
Results look like this:
regress=# SELECT * FROM email_diffs('fred@bo','fred@bob');
chnum | ch | fromwhich
-------+----+-----------
7 | b | 2
(1 row)
regress=# SELECT * FROM email_diffs('fred@bob','fred@bbo');
chnum | ch | fromwhich
-------+----+-----------
6 | b | 2
6 | o | 1
7 | b | 1
7 | o | 2
(4 rows)
You should be able to call this with another CTE that tests for each of the errors you are interested in, or just extend the CTE in that main function with additional clauses to test for each case and return a verdict.
Exactly how to do that will depend on the specifics of the rules you must test and how strict you have to be about detecting exactly one typo. You haven't specified that.
You might have noticed the odd-looking use of a SELECT without a FROM clause in e1chars
and e2chars
, where two functions are called in the SELECT list. This is a very strange PostgreSQL extension to SQL that you really shouldn't generally use, as results are often not what you expect. PostgreSQL will support the SQL-standard LATERAL
syntax in 9.3, and that should be used in preference.
Upvotes: 2