José Garcia
José Garcia

Reputation: 136

Pythonic way to match a string if contained in a set of strings

I am trying to match names by using the first, second, and last names, either in the correct order or not, using all of them or not. So far I've got this code and it sort of works, but I think it's not the right way of doing it. Do you know another way of doing this?

The names in the data set look like this:

name = 'DAVID SCOTT MUSTAIN'

What I want is to match that name if I search for 'DAVID', 'MUSTAIN SCOTT', 'SCOTT DAVID', etc.. The function I got so far looks like this:

def search_name(somename):   
    for full_name in some_dataset:
        if set(somename.upper().split()).issubset(full_name.split()):
            print('match:', full_name)

If I input something like 'DAV' or 'SCOT', this will not match anything. How should I proceed in order to make a match even with incomplete names? If I split the names into single letters it will match every name with those letters without checking the order of the letters.

Upvotes: 1

Views: 369

Answers (3)

t.m.adam
t.m.adam

Reputation: 15376

You can use any to check if any name in somename is a subset of any of the names in full_name

def search_name(somename):   
    for full_name in some_dataset:
        if any(n.upper() in fn for n in somename.split() for fn in full_name.split()):
            print('match:', full_name)

And here is an example using sum and a dictionary to pick the name with the most matches:

def search_name(somename):  
    matches = {} 
    for full_name in some_dataset:
        matches[full_name] = sum(1 for n in somename.split() for fn in full_name.split() if n.upper() in fn)
    best_matches = [k for k,v in matches.items() if v == max(matches.values()) if v != 0]
    for match in best_matches: 
        print('match:', match)

I'm sure there are better ways to write this function but i'm very sleep deprived..
As for your second question perhaps you could print/return all the items in the best_matches list?

Upvotes: 2

kip
kip

Reputation: 1140

I made a little function that use more statements

def search_name(name, toSearch, num = 2):
    found = []
    for word in name.split():
        search = word[:num]
        for letter in word[num:]:
            search += letter
            isThere = [data for data in toSearch.split() if data in search]
            if isThere:
                found += isThere
                break
    return len(toSearch.split()) == len(found)

name = 'DAVID SCOTT MUSTAIN'
if search_name(name,'TA'):
    print(name)
else:
    print('Nothing')

You want this ?

Upvotes: 1

Jon Deaton
Jon Deaton

Reputation: 4379

I might use

if full_name in somename and not set(full_name.split()) - set(someone.split())

to see if its a substring and it contains no extra short names.

Upvotes: 0

Related Questions