Reputation: 417
I am currently working on a project of analyzing the quality examination paper questions.In here I am using Python 3.4 with NLTK.
So first I want to take out each question separately from the text.The question paper format is given below.
(Q1). What is web 3.0?
(Q2). Explain about blogs.
(Q3). What is mean by semantic web?
and so on ........
So now I want to extract the questions one by one without having the question number(Question number format is always same as given above).So my result should be something like this.
What is web 3.0?
Explain about blogs.
What is mean by semantic web?
So how can tackle this problem with python 3.4 with NLTK?
Thank you
Upvotes: 1
Views: 1611
Reputation: 50200
You'll probably need to detect lines containing a question, then extract the question and drop the question number. The regexp for detecting a question label is
qnum_pattern = r"^\s*\(Q\d+\)\.\s+"
You can use it to pull out the questions like this:
questions = [ re.sub(qnum_pattern, "", line) for line in text if
re.search(qnum_pattern, line) ]
Obviously, text
must be a list of lines or a file open for reading.
But if you had no idea how to approach this, you have your work cut out for you with the rest of the assignment. I recommend spending some time on the python tutorial or other introductory materials.
Upvotes: 2
Reputation: 122112
If the (QX)
always separated by a space before the text, you can do this:
>>> text = """(Q1). What is web 3.0?
... (Q2). Explain about blogs.
... (Q3). What is mean by semantic web?"""
>>> for line in text.split('\n'):
... print line.strip().partition(' ')[2]
...
What is web 3.0?
Explain about blogs.
What is mean by semantic web?
Upvotes: 1
Reputation: 4163
In case every sentence starts with this pattern, what you ask for is easy to parse, you can use split
to remove this prefix:
sentences = [ "(Q1). What is web 3.0?",
"(Q2). Explain about blogs.",
"(Q3). What is mean by semantic web?"]
for sen in sentences:
print sen.split('). ',1)[1]
This will print:
What is web 3.0?
Explain about blogs.
What is mean by semantic web?
Upvotes: 1