FunnyChef
FunnyChef

Reputation: 1946

Python pandas difflib throws; "list index out of range" Error

Why does difflib.get_close_matches throw the "list index out of range" Error when no matches are found in the following example?

from pandas import DataFrame
import difflib

df1 = DataFrame([[1,'034567','Foo'],
                 [2,'1cd2346','Bar']], 
                columns=['ID','Unit','Name'])
df2 = DataFrame([['SellTEST','0ab1234567'],
                 ['superVAR','1ab2345']], 
                columns=['Seller', 'Unit'])

df2['Unit'] = df2['Unit'].apply(lambda x: difflib.get_close_matches(x, df1['Unit'])[0])

df1.merge(df2)

I get that the value in df1 is way off - but I wouldn't expect this to error like it does, I would expect it to simply not match.

Upvotes: 0

Views: 2193

Answers (1)

Aaron Christiansen
Aaron Christiansen

Reputation: 11817

get_close_matches does simply not match - the list returned by difflib.get_close_matches is empty, and then you try and access the first element of it, which throws the IndexError.

If you wanted to replace an element where there are no matches with None, you could use this code instead, which utilises the fact that an empty list is falsey to replace a falsey value with None:

df2['Unit'] = df2['Unit'].apply(lambda x: (difflib.get_close_matches(x, df1['Unit'])[:1] or [None])[0])

Upvotes: 1

Related Questions