Reputation: 11
I need to parse a series of short strings that are comprised of 3 parts: a question and 2 possible answers. The string will follow a consistent format:
This is the question "answer_option_1 is in quotes" "answer_option_2 is in quotes"
I need to identify the question part and the two possible answer choices that are in single or double quotes.
Ex.:
What color is the sky today? "blue" or "grey"
Who will win the game 'Michigan' 'Ohio State'
How do I do this in python?
Upvotes: 1
Views: 352
Reputation: 63709
Pyparsing will give you a solution that will adapt to some variability in the input text:
questions = """\
What color is the sky today? "blue" or "grey"
Who will win the game 'Michigan' 'Ohio State'""".splitlines()
from pyparsing import *
quotedString.setParseAction(removeQuotes)
q_and_a = SkipTo(quotedString)("Q") + delimitedList(quotedString, Optional("or"))("A")
for qn in questions:
print qn
qa = q_and_a.parseString(qn)
print "qa.Q", qa.Q
print "qa.A", qa.A
print
Will print:
What color is the sky today? "blue" or "grey"
qa.Q What color is the sky today?
qa.A ['blue', 'grey']
Who will win the game 'Michigan' 'Ohio State'
qa.Q Who will win the game
qa.A ['Michigan', 'Ohio State']
Upvotes: 0
Reputation: 123612
If your format is a simple as you say (i.e. not as in your examples), you don't need regex. Just split
the line:
>>> line = 'What color is the sky today? "blue" "grey"'.strip('"')
>>> questions, answers = line.split('"', 1)
>>> answer1, answer2 = answers.split('" "')
>>> questions
'What color is the sky today? '
>>> answer1
'blue'
>>> answer2
'grey'
Upvotes: 1
Reputation: 28154
>>> import re
>>> s = "Who will win the game 'Michigan' 'Ohio State'"
>>> re.match(r'(.+)\s+([\'"])(.+?)\2\s+([\'"])(.+?)\4', s).groups()
('Who will win the game', "'", 'Michigan', "'", 'Ohio State')
Upvotes: 1
Reputation: 66709
One possibility is that you can use regex.
import re
robj = re.compile(r'^(.*) [\"\'](.*)[\"\'].*[\"\'](.*)[\"\']')
str1 = "Who will win the game 'Michigan' 'Ohio State'"
r1 = robj.match(str1)
print r1.groups()
str2 = 'What color is the sky today? "blue" or "grey"'
r2 = robj.match(str2)
r2.groups()
Output:
('Who will win the game', 'Michigan', 'Ohio State')
('What color is the sky today?', 'blue', 'grey')
Upvotes: 0