Reputation: 111
I am working on a small project and need some help with searching for text in strings
Lets say I have a primary string1 such as : Loan Coordinator
Lets say I have another string2 such as : Financial Student Loan Coordinator
Lets say I have another string3 such as : Loan Operator
Lets say I have another string4 such as : Coordinator
Lets say I have another string5 such as : Financial Assistant
. .
In Python, what would be the best approarch to find all strings that have something to do with string1? For example:
String 2 has to deal with String 1 because of the text Loan Coordinator within the String
String 3 has something to do because of the word Loan
String 4 has something to do because of the word Coordinator
String 5 has nothing to do so i dont care about this string.
2, 3, and 4 should return FOUND or something that indicates a small match is present.
..
Thanks for all assistance!
Upvotes: 0
Views: 274
Reputation: 6190
#!/usr/bin/env python
import sys
def tokenise(s):
return set([word.lower() for word in s.split()])
def match_strings(primary, secondary):
primary_tokens = tokenise(primary)
secondary_tokens = tokenise(secondary)
matches = primary_tokens.intersection(secondary_tokens)
if matches:
print "{} matches because of {}".format(secondary, ", ".join(matches))
else:
print "{} doesnt match".format(secondary)
if __name__ == "__main__":
primary = sys.argv[1]
secondaries = sys.argv[2:]
for secondary in secondaries:
match_strings(primary, secondary)
Running the code:
~/string_matcher.py "Loan Coordinator" "Financial Student Loan Coordinator" "Loan Operator" "Coordinator" "Financial Assistant"
Financial Student Loan Coordinator matches because of coordinator, loan
Loan Operator matches because of loan
Coordinator matches because of coordinator
Financial Assistant doesnt match
Upvotes: 1
Reputation: 118011
You can use set intersection. Make a set of unique words in your string to compare against. Then take the intersection with the set of words from each of the other strings. Keep any string that has a non-empty intersection.
>>> s1 = 'Loan Coordinator'
>>> sList = ['Financial Student Loan Coordinator', 'Loan Operator', 'Coordinator', 'Financial Assistant']
>>> unique = set(s1.split()) # unique words in string 1
>>> [i for i in sList if unique & set(i.split())]
['Financial Student Loan Coordinator', 'Loan Operator', 'Coordinator']
Upvotes: 1