Jamie Bull
Jamie Bull

Reputation: 13529

Create a new table column with closest string match from another table

I have two lists of names for locations, with slightly different spelling, capitalisation, etc.

I'm trying to match each site in the first list to the most similar one in the second list.

SELECT name1, name2 FROM table1, table2
    WHERE levenshtein(name1, name2) < 3

Is along the right lines, but throws up a few false positives. Most of those would be avoided if I could select the record with the smallest Levenshtein distance.

In Python I would be looking to do:

for row1 in table1:
    min_dist = 100
    for row2 in table2:
        dist = levenshtein(row1.name, row2.name)
        if dist < min_dist:
            min_dist = dist
            closest_name = row2.name
    row1.newcolumn = closest_name

Is this possible in postgresql?

Upvotes: 0

Views: 692

Answers (1)

redneb
redneb

Reputation: 23850

If I understand your question correctly, for each name1, you are looking for the closest name2. That can be accomplished with a subquery, something like that:

SELECT
    name1,
    (SELECT name2
     FROM table2
     WHERE levenshtein(name1, name2) < 3
     ORDER BY levenshtein(name1, name2)
     LIMIT 1
    )
FROM table1

I am assuming that you have defined levenshtein yourself.

Upvotes: 1

Related Questions