Reputation: 238
This is a sample of the text I am working with.
6) Jake's Taxi Service is a new entrant to the taxi industry. It has achieved success by staking out a unique position in the industry. How did Jake's Taxi Service mostly likely achieve this position?
A) providing long-distance cab fares at a higher rate than competitors; servicing a larger area than competitors
B) providing long-distance cab fares at a lower rate than competitors; servicing a smaller area than competitors
C) providing long-distance cab fares at a higher rate than competitors; servicing the same area as competitors
D) providing long-distance cab fares at a lower rate than competitors; servicing the same area as competitors
Answer: D
I am trying to match the entire question including the answer options. Everything from the question number to the word Answer
This is my current regex expression
((rf'(?<={searchCounter}\) ).*?(?=Answer).*'), re.DOTALL)
SearchCounter is just a variable that will correspond to the current question, in this case 6. I think the issue is something to do with searching across the new lines.
EDIT: Full source code
searchCounter = 1
bookDict = {}
with open ('StratMasterKey.txt', 'rt') as myfile:
for line in myfile:
question_pattern = re.compile((rf'(?<={searchCounter}\) ).*?(?=Answer).*'), re.DOTALL)
result = question_pattern.search(line)
if result != None:
bookDict[searchCounter] = result[0]
searchCounter +=1
Upvotes: 2
Views: 106
Reputation: 626893
The reason your regex fails is that you read the file line by line with for line in myfile:
, while your pattern searches for matches in a single multiline string.
Replace for line in myfile:
with contents = myfile.read()
and then use result = question_pattern.search(contents)
to get the first match, or result = question_pattern.findall(contents)
to get multiple matches.
A note on the regex: I am not fixing the whole pattern since, as you mentioned, it is out of scope of this question, but since the string input is a multiline string now, you need to remove re.DOTALL
and use [\s\S]
to match any char in the pattern and .
to match any char but a line break char. Also, the lookaround contruct is redundant, you may safely replace (?=Answer)
with Answer
. Also, to check if there is a match, you may simply use if result:
and then grab the whole match value by accessing result.group()
.
Full code snippet:
with open ('StratMasterKey.txt', 'rt') as myfile:
contents = myfile.read()
question_pattern = re.compile((rf'(?<={searchCounter}\) )[\s\S]*?Answer.*'))
result = question_pattern.search(contents)
if result:
print( result.group() )
Upvotes: 2