Reputation: 1521
This is a follow up to my previous question,I am trying to replace strings in one list with strings in another list.
import numpy as np
from difflib import SequenceMatcher
from pprint import pprint
def similar(a, to_match):
percent_similarity = [SequenceMatcher(None, a, b).ratio() for b in to_match]
max_value_index = [i for i, j in enumerate(percent_similarity) if j == max(percent_similarity)][0]
map = [to_match[max_value_index] if max(percent_similarity) > 0.9 else a][0]
return map
if __name__ == '__main__':
strlist = ['D-saturn 6-pluto', np.nan, 'D-astroid 3-cyclone', 'DL-astroid 3-cyclone', 'DL-astroid', 'D-comment', 'literal']
to_match = ['saturn 6-pluto', 'pluto', 'astroid 3-cyclone', 'D-comment', 'D-astroid']
for item in strlist:
map = [similar(item, to_match) for item in strlist]
pprint(map)
Expected output:
['saturn 6-pluto', np.nan, 'astroid 3-cyclone', 'astroid 3-cyclone', 'D-astroid', 'D-comment', 'literal']
The code works if there is no np.nan
in strlist
.
I want to check if a string is nan
and return nan
if it exists.
However, I'm not sure how to use elif
statement in the list comprehension map = [to_match[max_value_index] if max(percent_similarity) > 0.9 else a][0]
Could someone help me with this?
Upvotes: 0
Views: 62
Reputation: 10860
EDIT:
Ok then, how about changing your similar
function to return the item itself if its type is not string?
def similar(a, to_match):
if type(a) is not str:
return a
percent_similarity = [SequenceMatcher(None, a, b).ratio() for b in to_match]
max_value_index = [i for i, j in enumerate(percent_similarity) if j == max(percent_similarity)][0]
ret = [to_match[max_value_index] if max(percent_similarity) > 0.9 else a][0]
return ret
You can filter your strlist
before processing it in the for-loop by
strlist = [s for s in strlist if type(s) is str]
Upvotes: 1
Reputation: 51
You can write an if else in the other map function
map = [similar(item, to_match) if isinstance(item, str) else item for item in strlist]
Upvotes: 0