Boyos123
Boyos123

Reputation: 119

Find specific string objects in text

Lets say that I have a free text filled with information about specific cars, car brands and other automative-related information. I want to extract this information from the text following a certain template:

For example: "Mike drove away in a black Mercedes with four other people. Moreover he also owns a BMW M3 in Europe."

Template 1: Brand: Mercedes, Model: -, Color: Black

Template 2: Brand: BMW, Model: M3, Color: -

What is the best way to tackle this in Python? Although I have some knowledge about NLTK, POS tagging and NP-chunking, I am thinking it could be done I an easier way once I can recognize specific terms, from for example a (nested) dictionary that contains lists. As such, it would behave like a controlled vocabulary.

Hopefully, someone has a nice example or can point me in the right direction. Thanks

Upvotes: 0

Views: 376

Answers (1)

stovfl
stovfl

Reputation: 15513

Assumption:

  1. You have a dictionary like this:
    Brand = ['Mercedes', 'BMW']
    Model = ['M3']
    Color = ['black']
  2. The three Keywords have allways the following order in the text:
    Color Brand Model

Using your example text, I got the following result:

words = text.split(' ')
templates = []
for i, word in enumerate(words):
    if word in Brand:
        template = {'Brand': None, 'Model': None, 'Color': None}
        template['Brand'] = word
        if words[i-1] in Color:
            template['Color'] = words[i-1]
        if words[i+1] in Model:
            template['Model'] = words[i+1]

        templates.append( template )

print(templates)

[{'Brand': 'Mercedes', 'Model': None, 'Color': 'black'}, {'Brand': 'BMW', 'Model': 'M3', 'Color': None}]

Upvotes: 1

Related Questions