Reputation: 119
Lets say that I have a free text filled with information about specific cars, car brands and other automative-related information. I want to extract this information from the text following a certain template:
For example: "Mike drove away in a black Mercedes with four other people. Moreover he also owns a BMW M3 in Europe."
Template 1: Brand: Mercedes, Model: -, Color: Black
Template 2: Brand: BMW, Model: M3, Color: -
What is the best way to tackle this in Python? Although I have some knowledge about NLTK, POS tagging and NP-chunking, I am thinking it could be done I an easier way once I can recognize specific terms, from for example a (nested) dictionary that contains lists. As such, it would behave like a controlled vocabulary.
Hopefully, someone has a nice example or can point me in the right direction. Thanks
Upvotes: 0
Views: 376
Reputation: 15513
Assumption:
- You have a dictionary like this:
Brand = ['Mercedes', 'BMW']
Model = ['M3']
Color = ['black']- The three Keywords have allways the following order in the text:
Color Brand Model
Using your example text
, I got the following result:
words = text.split(' ')
templates = []
for i, word in enumerate(words):
if word in Brand:
template = {'Brand': None, 'Model': None, 'Color': None}
template['Brand'] = word
if words[i-1] in Color:
template['Color'] = words[i-1]
if words[i+1] in Model:
template['Model'] = words[i+1]
templates.append( template )
print(templates)
[{'Brand': 'Mercedes', 'Model': None, 'Color': 'black'}, {'Brand': 'BMW', 'Model': 'M3', 'Color': None}]
Upvotes: 1