cvk60
cvk60

Reputation: 13

Python regexp, how to match sentences


I am trying to match "sentence two foo" and sentence four foo" in the following string:

sentence one foo sentence two foo   sentence three foo    sentence four foo   sentence five

note that each sentence can contain more than one space, but never consecutive spaces and that each sentence is separated from the preceding and following one by at least 2 consecutive spaces

I am using the following pattern for matching:

.*(sentence two.*  ).*(sentence four.*  )

Note the double space after each of the two sentences.
The problem, as you well know, is that due to the greediness of the matching engine, it will match up to the double space at the end of sentence four. So my first match group(1) will be more than I want and my second match group(2) will be empty. What I need is "sentence twofoo" in group(1) and "sentence four foo" in group(2)

I have read the posts about the non-greedy operator "?" but I'm having problems applying it to the double spaces (which, incidentally, doesn't necessarily have to be double, it can also be three, four, etc.)

I tried:

.*(sentence two.*)(  )?.*(sentence four.*)(  )?

and taking group(1) and group(3), but it doesn't seem to make any difference...
Any help is greatly appreciated.

Thanks
/Andrea

Upvotes: 1

Views: 107

Answers (1)

user1919238
user1919238

Reputation:

The non-greedy operator should be applied to the part that grabs the sentences, not the double spaces:

/(sentence two.*?)  .*(sentence four.*?)/

(Because you want to match the shortest possible amount of text before encountering a double space)

Upvotes: 1

Related Questions