Reputation: 474
I wish to tabulate all the topics from questions asked in a question paper. This is an example of the format of two questions asked in the paper:
question1 = 'Write short notes on the anatomy of the Circle of Willis including normal variants.'
question2 = 'Write short notes on the anatomy of the axis (C2 vertebra).'
From the above questions, I expect to get the topics:
topic1 = 'Circle of Willis including normal variants'
topic2 = 'axis (C2 vertebra)'
For the above, I wrote the following code snippet:
def extract_topic(message):
message = re.search('Write short notes on the anatomy of the (.+?).', message)
if message:
return message.group(1)
Of course, the above code failed miserably! What am I to do? What's the easiest way to do the above? Would using NLTK make the above easy?
Upvotes: 1
Views: 409
Reputation: 573
If the format of your data is still the same as you show -> quite easy solution is:
question1 = 'Write short notes on the anatomy of the Circle of Willis including normal variants.'
question2 = 'Write short notes on the anatomy of the axis (C2 vertebra).'
list_of_questions = [question1, question2]
topics = [question.split("Write short notes on the anatomy of the ")[1] for question in list_of_questions]
print(topics)
Upvotes: 0
Reputation: 88
.
at the end, since .
means match any char except line break. Also (.+?)
is non greedy, therefore it matches one char and .
after that matches one more char.Below code should work,
def extract_topic(message):
message = re.search('Write short notes on the anatomy of the (.+?)\.', message)
if message:
return message.group(1)
Upvotes: 0
Reputation: 1146
Try this
def extract_topic(message):
message = re.search('Write short notes on the anatomy of the (.*).', message)
if message:
return message.group(1)
Upvotes: 2