RHK-S8
RHK-S8

Reputation: 329

Python regular expressions for simple questions

I wish to let the user ask a simple question, so I can extract a few standard elements from the string entered.

Examples of strings to be entered:

As you can see sometimes it is "Who", sometimes it is "What". I'm most likely looking for the "|" operator. I'll need to extract two things from these strings. The word after "the" and before "of", as well as the word after "of".

For example:

1st sentence: I wish to extract "director" and place it in a variable called Relation, and extract "The Dark Knight" and place it in a variable called Concept.

Desired output:

RelationVar = "director"
ConceptVar = "The Dark Knight"

2nd sentence: I wish to extract "capital", assign it to variable "Relation".....and extract "China" and place it in variable "Concept".

RelationVar = "capital"
ConceptVar = "China"

Any ideas on how to use the re.match function? or any other method?

Upvotes: 1

Views: 104

Answers (2)

Dinever
Dinever

Reputation: 690

Here is the script, you can simply use | to optional match one inside the brackets.

This worked fine for me

import re
list = ['Who is the director of The Dark Knight?','What is the capital of China?','Who is the president of USA?']
for string in list:
    a = re.compile(r'(What|Who) is the (.+) of (.+)')
    nodes = a.findall(string);
    Relation = nodes[0][0]
    Concept = nodes[0][1]
    print Relation
    print Concept
    print '----'

Best Regards:)

Upvotes: 1

Nolen Royalty
Nolen Royalty

Reputation: 18633

You're correct that you want to use | for who/what. The rest of the regex is very simple, the group names are there for clarity but you could use r"(?:Who|What) is the (.+) of (.+)[?]" instead.

>>> r = r"(?:Who|What) is the (?P<RelationVar>.+) of (?P<ConceptVar>.+)[?]"
>>> l = ['Who is the director of The Dark Knight?', 'What is the capital of China?', 'Who is the president of USA?']
>>> [re.match(r, i).groupdict() for i in l]
[{'RelationVar': 'director', 'ConceptVar': 'The Dark Knight'}, {'RelationVar': 'capital', 'ConceptVar': 'China'}, {'RelationVar': 'president', 'ConceptVar': 'USA'}]

Change (?:Who|What) to (Who|What) if you also want to capture whether the question uses who or what.

Actually extracting the data and assigning it to variables is very simple:

>>> m = re.match(r, "What is the capital of China?")
>>> d = m.groupdict()
>>> relation_var = d["RelationVar"]
>>> concept_var = d["ConceptVar"]
>>> relation_var
'capital'
>>> concept_var
'China'

Upvotes: 1

Related Questions