Reputation: 1946
Why does difflib.get_close_matches throw the "list index out of range" Error when no matches are found in the following example?
from pandas import DataFrame
import difflib
df1 = DataFrame([[1,'034567','Foo'],
[2,'1cd2346','Bar']],
columns=['ID','Unit','Name'])
df2 = DataFrame([['SellTEST','0ab1234567'],
['superVAR','1ab2345']],
columns=['Seller', 'Unit'])
df2['Unit'] = df2['Unit'].apply(lambda x: difflib.get_close_matches(x, df1['Unit'])[0])
df1.merge(df2)
I get that the value in df1 is way off - but I wouldn't expect this to error like it does, I would expect it to simply not match.
Upvotes: 0
Views: 2193
Reputation: 11817
get_close_matches
does simply not match - the list returned by difflib.get_close_matches
is empty, and then you try and access the first element of it, which throws the IndexError
.
If you wanted to replace an element where there are no matches with None
, you could use this code instead, which utilises the fact that an empty list is falsey to replace a falsey value with None
:
df2['Unit'] = df2['Unit'].apply(lambda x: (difflib.get_close_matches(x, df1['Unit'])[:1] or [None])[0])
Upvotes: 1