Regex for specific Pattern

Question

I have this text:

What can cause skidding on bends?

All of the following can:
SA Faulty shock-absorbers
SA Insufficient or uneven tyre pressure
[| Load is too small

What can cause a dangerous situation?

SA Brakes which engage heavily on one side
SA Too much steering-wheel play

[| Disturbed reception of traffic information on the radio

It starts raining. Why must you immediately increase the safe distance?

What is correct

[| Because the brakes react more quickly

SA Because a greasy film may form which increases the braking distance

SA Because a second greasy film may form which increases the braking distance

What the text is about?

Above are multiple choice questions with multiple options. The question stem is almost always ends with '?' but sometimes there is additional text before the multiple option starts. All options either starts by the word 'SA' or '[|' , all option starts with 'SA'are correct and the option starts with '[|' or '[]' are wrong.

What I want to Do

I want to split the questions and all multiple option and save them into python dictionary/list ideally as key values pairs {'ques': 'blalal','opt1':'this is option one', 'option2': 'this is option two'} and so on

What I have tried? rx='r.*\?$\s*\w*(?:SA|\[\|)'

this is Reg101 link

Ryszard Czech · Accepted Answer

Assuming you have three options at all times:

p = r'(?m)^(?P\w[^?]*\?)[\s\S]*?^(?P(?:SA|\[(?:\||\s])).*)\s+^(?P(?:SA|\[(?:\||\s])\[\|).*)\s+^(?P(?:SA|\[(?:\||\s])).*)'
dt = [x.groupdict() for x in re.finditer(p, string)]

See regex proof and Python proof.

Results:

[{'ques': 'What can cause skidding on bends?', 'opt1': 'SA Faulty shock-absorbers', 'opt2': 'SA Insufficient or uneven tyre pressure', 'opt3': '[| Load is too small'}, {'ques': 'What can cause a dangerous situation?', 'opt1': 'SA Brakes which engage heavily on one side', 'opt2': 'SA Too much steering-wheel play', 'opt3': '[| Disturbed reception of traffic information on the radio'}, {'ques': 'It starts raining. Why must you immediately increase the safe distance?', 'opt1': '[| Because the brakes react more quickly', 'opt2': 'SA Because a greasy film may form which increases the braking distance', 'opt3': 'SA Because a second greasy film may form which increases the braking distance'}]

Regex for specific Pattern

Answers (2)

Related Questions