Reputation: 93
I am trying to find every Question Phrase with python regex, So basically I need to find a initial ponctuation, and detect everything inside until the question mark, avoiding other pontuations in the middle.
So I came with the code:
questionRegex = re.compile(r'[?.!][A-Za-z\s]*\?')
and then I use this regex to find questions inside this text:
text = '''
Maybe the barista’s looking at me because she thinks I’m attractive. I am in my blue shirt. So she has stringy hair? Who am I to complain about stringy hair? Who do I think I am? Cary Grant?
And now John was doing temp work at the law firm of Fleurstein and Kaplowitz to get himself righted again. He had a strong six-month plan: he would save some money to pay Rebecca’s parents back for the house and be able to take some time off to focus on his writing—on his painting. In a few months, he would be back on his feet, probably even engaged to someone new. Maybe even that barista. Yes, almost paradoxically, temp work provided John with the stability he craved.
This is shit. It is utter shit. What are you talking about? Are you serious about this?
'''
like this:
process = questionRegex.findall(text)
but the result I get is this:
. So she has stringy hair?
? Who do I think I am?
. What are you talking about?
problem is that there are 5 questions in this text. Meaning this regex is not being able to catch the questions:
what is wrong with my code, and why doesn't it catch those two questions like the others?
Upvotes: 0
Views: 2188
Reputation: 1
If a text starts with a question the regular expressions mentioned above will skip that first question. To solve this add a question mark after the \s
.
The regex:
/\s<strong>?</strong>[A-Za-z\s]*\?/
and in the latter a question mark after the lookbehind group
/(?<=[\?\.\!]\s)<strong>?</strong>[^\?\n\.]+?\?/
Upvotes: 0
Reputation: 9
You can try this:
(?<=[\?\.\!]\s)[^\?\n\.]+?\?
Matchs:
So she has stringy hair?
Who am I to complain about stringy hair?
Who do I think I am?
Cary Grant?
What are you talking about?
Are you serious about this?
Upvotes: 0
Reputation: 2254
I figured out why your regex pattern is unable to return all the results.
The following strings:
In fact, any next statement which is a question is after space character.
So rather than specifying a group of [?.!]
you can simply use \s
Pattern becomes:
In [20]: pattern = re.compile(r'\s[A-Za-z\s]*\?')
In [21]: pattern.findall(text)
Out[21]:
[' So she has stringy hair?',
' Who am I to complain about stringy hair?',
' Who do I think I am?',
' Cary Grant?',
' What are you talking about?',
' Are you serious about this?']
Upvotes: 1