Reputation: 11
I have a dictionary of degrees that can be obtained from a university. The dictionary looks like this:
deg_dict = [
{'Doctor of Philosophy': 'PhD', 'Ph.D.', 'Doctor of Philosophy'},
{'Bachelor of Science': 'BS', 'B.S.', 'BSc', 'B.Sc.'}
{'Master of Arts': 'MA', 'M.A.'}
]
I also have a list of phrases, I want to find the index of phrases within that list that have items corresponding to values in the degree dictionary.
phrase_list = ['Lisa has a Ph.D.', 'Maggie earned her B.S. from Duke University', 'Bart dropped out of his MA program', 'I made this out of thin air']
I can do this using this code:
degindex = [i for i, s in enumerate(pharse_list) for key, value in deg_dict.iteritems() for deg in value if deg in s]
However, this is quite messy and will pull out indices from phrase_list that are nonspecific. For example, degindex would return all 4 indices from the phrase_list, because "of" appear in the last index of phrase_list and is part of the dictionary value 'Doctor of Philosophy'. Additionally, the last index would be pulled out because the letters 'ma' appear in the word 'made' and is a value under the 'Master of Arts' key in deg_dict.
How can I make the dictionary values be 'whole' as they are - such that the index from phrase_list would only be returned if the entire phrase 'Doctor of Philosophy' were found within phrase_list or if the letters 'MA' were found by themselves (not within a word)?
Upvotes: 0
Views: 209
Reputation: 29
If you want the index, instead of print(deg_dict[word])
in 0liveradam8's answer on line 6, instead add the following line:
print(sentence.find(word))
Upvotes: 1
Reputation: 778
First off, let's change your dictionary so that it functions as desired.
deg_dict = {
'PhD':'Doctor of Philosophy',
'Ph.D.':'Doctor of Philosophy',
'BS':'Bachelor of Science',
'B.S.':'Bachelor of Science',
'BSc':'Bachelor of Science',
'B.Sc.':'Bachelor of Science',
'MA':'Master of Arts',
'M.A.':'Master of Arts'}
With this dictionary, if you input the abbreviation for a degree like this: deg_dict['PhD']
, it will output the full name of the degree like this: "Doctor of Philosophy"
Now using this code we can find out whether each phrase contains an abbreviation, and output the full name of the degree. Please note that if a sentence contains multiple abbreviations, only the first one is extracted.
phrase_list = ['Lisa has a Ph.D.', 'Maggie earned her B.S. from Duke University', 'Bart dropped out of his MA program', 'I made this out of thin air']
for sentence in phrase_list:
for word in sentence.split(" "):
if word in deg_dict:
print(deg_dict[word])
break
else:
print("No abbreviation found in sentence.")
Output:
Doctor of Philosophy
Bachelor of Science
Master of Arts
No abbreviation found in sentence.
Upvotes: 2