lizzard
lizzard

Reputation: 11

How to match a complete string from a dictionary to a string in a python list?

I have a dictionary of degrees that can be obtained from a university. The dictionary looks like this:

deg_dict = [
{'Doctor of Philosophy': 'PhD', 'Ph.D.', 'Doctor of Philosophy'},
{'Bachelor of Science': 'BS', 'B.S.', 'BSc', 'B.Sc.'}
{'Master of Arts': 'MA', 'M.A.'}
]

I also have a list of phrases, I want to find the index of phrases within that list that have items corresponding to values in the degree dictionary.

phrase_list = ['Lisa has a Ph.D.', 'Maggie earned her B.S. from Duke University', 'Bart dropped out of his MA program', 'I made this out of thin air']

I can do this using this code:

degindex = [i for i, s in enumerate(pharse_list) for key, value in deg_dict.iteritems() for deg in value if deg in s]

However, this is quite messy and will pull out indices from phrase_list that are nonspecific. For example, degindex would return all 4 indices from the phrase_list, because "of" appear in the last index of phrase_list and is part of the dictionary value 'Doctor of Philosophy'. Additionally, the last index would be pulled out because the letters 'ma' appear in the word 'made' and is a value under the 'Master of Arts' key in deg_dict.

How can I make the dictionary values be 'whole' as they are - such that the index from phrase_list would only be returned if the entire phrase 'Doctor of Philosophy' were found within phrase_list or if the letters 'MA' were found by themselves (not within a word)?

Upvotes: 0

Views: 209

Answers (2)

astaning
astaning

Reputation: 29

If you want the index, instead of print(deg_dict[word]) in 0liveradam8's answer on line 6, instead add the following line:

print(sentence.find(word))

Upvotes: 1

0liveradam8
0liveradam8

Reputation: 778

First off, let's change your dictionary so that it functions as desired.

deg_dict = {
'PhD':'Doctor of Philosophy',
'Ph.D.':'Doctor of Philosophy',
'BS':'Bachelor of Science',
'B.S.':'Bachelor of Science',
'BSc':'Bachelor of Science',
'B.Sc.':'Bachelor of Science',
'MA':'Master of Arts',
'M.A.':'Master of Arts'}

With this dictionary, if you input the abbreviation for a degree like this: deg_dict['PhD'], it will output the full name of the degree like this: "Doctor of Philosophy"

Now using this code we can find out whether each phrase contains an abbreviation, and output the full name of the degree. Please note that if a sentence contains multiple abbreviations, only the first one is extracted.

phrase_list = ['Lisa has a Ph.D.', 'Maggie earned her B.S. from Duke University', 'Bart dropped out of his MA program', 'I made this out of thin air']

for sentence in phrase_list:
    for word in sentence.split(" "):
        if word in deg_dict:
            print(deg_dict[word])
            break
    else:
        print("No abbreviation found in sentence.")

Output:

Doctor of Philosophy
Bachelor of Science
Master of Arts
No abbreviation found in sentence.

Upvotes: 2

Related Questions