Bharathwaaj
Bharathwaaj

Reputation: 2697

Python parse text and group into different parts

I've a text like the following

  1. This is a first question and can go to multiple paragraphs. Multiple lines. etc.
    (1)First Option (2) Second Option (3) Third option (4) Fourth Option (5) None of these

  2. 8 × ? = 4888 ÷ 4
    (1) 150.75 (2) 125.75 (3) 125.05 (4) 152.75 (5) None of these

  3. (62.5 × 14 × 5) ÷ 25 + 41 =
    (1) 4 (2) 5 (3) 9 (4) 8 (5) 6

  4. (23 × 23 × 23 × 23 × 23 × 23)×
    (1) 32 (2) 30 (3) 9 (4) 7 (5) 11

I would like to parse this into different parts so that I can iterate in a for loop and get each question and also iterate over each answers. The rule is that every question will start with an integer at the start of line (^) followed by a dot. The answers will be prefixed by integers 1 to 5 surrounded by brackets (1-5).

I would like the parsed data say for ex something like:

for item in parsed_data:
    print item.text
    for answer in item.answers:
        print answer.text

How to do this using python regex?

Upvotes: 0

Views: 82

Answers (1)

Ryan Saxe
Ryan Saxe

Reputation: 17829

honestly, you can just use re.split() for this:

#text is the variable with your text
text = text.strip()
questions = re.split(r'\d+\.',text)
questions = [x.strip() for x in questions if x != '']
final = [re.split(r'\(\d+\)',x) for x in questions]

for part in final:
    question = part[0]
    print question
    for answer in part[1:]:
        print answer

Upvotes: 1

Related Questions