Legion
Legion

Reputation: 474

How to extract a topic from a sentence?

I wish to tabulate all the topics from questions asked in a question paper. This is an example of the format of two questions asked in the paper:

question1 = 'Write short notes on the anatomy of the Circle of Willis including normal variants.'
question2 = 'Write short notes on the anatomy of the axis (C2 vertebra).'

From the above questions, I expect to get the topics:

topic1 = 'Circle of Willis including normal variants'
topic2 = 'axis (C2 vertebra)'

For the above, I wrote the following code snippet:

def extract_topic(message):
    message = re.search('Write short notes on the anatomy of the (.+?).', message)
    if message:
        return message.group(1)

Of course, the above code failed miserably! What am I to do? What's the easiest way to do the above? Would using NLTK make the above easy?

Upvotes: 1

Views: 409

Answers (3)

Petr Matuska
Petr Matuska

Reputation: 573

If the format of your data is still the same as you show -> quite easy solution is:

question1 = 'Write short notes on the anatomy of the Circle of Willis including normal variants.'
question2 = 'Write short notes on the anatomy of the axis (C2 vertebra).'

list_of_questions = [question1, question2]

topics = [question.split("Write short notes on the anatomy of the ")[1] for question in list_of_questions]

print(topics)

Upvotes: 0

samizzy
samizzy

Reputation: 88

  • Your regex just has one mistake, you forgot to escape . at the end, since . means match any char except line break. Also (.+?) is non greedy, therefore it matches one char and . after that matches one more char.

Below code should work,

def extract_topic(message):
message = re.search('Write short notes on the anatomy of the (.+?)\.', message)
if message:
    return message.group(1)

Upvotes: 0

VishnuVS
VishnuVS

Reputation: 1146

Try this

def extract_topic(message):
    message = re.search('Write short notes on the anatomy of the (.*).', message)
    if message:
        return message.group(1)

Upvotes: 2

Related Questions