Reputation: 225
I need to match the regex that matches the sentence with following pattern:
1st part is occurrence of word/s.(eg: passed, died)
2nd part is the date in that sentence.
3rd part is, this should match only before the delimiter/dot/full stop.
Example:
Worth Scattergood (Dee) Lea passed on Thursday, July 28, 2022
, Worth Scattergood (Dee) Lea passed away unexpectedly at age 88 with her three daughters at her side. Dee was born on April 26, 1934, in Radnor, Pennsylvania.
Here i need result of: July 28, 2022
But this should not match or find any result in following sentence:
Worth Scattergood (Dee) Lea passed on Thursday. Dee was born on April 26, 1934, in Radnor, Pennsylvania.
I tried with following expression but it is wrong as it match upto second sentence:
(passed|died)(.*?)(\w+)\d{1,2},?\s?\d{4}
Upvotes: 2
Views: 89
Reputation: 626845
You can use
\b(?:passed|died)\b[^.?!]*?\b(\w+\s*\d{1,2},\s?\d{4})(?!\d)
See the regex demo.
Details
\b(?:passed|died)\b
- a word boundary, a non-capturing group matching either passed
or died
(as whole words) and a word boundary[^.?!]*?
- zero or more chars other than .
, !
and ?
as few as possible\b
- a word boundary(\w+\s*\d{1,2},\s?\d{4})
- Group 1: one or more word chars, zero or more whitespaces, one or two digits, comma, an optional whitespace, and four digits(?!\d)
- no digit immediately on the right is allowed.Upvotes: 0
Reputation: 785156
You can match keywords passed
or died
and then allow upto 3 space separated substrings before matching date:
\b(?>passed|died)(?>\h+\S+){0,3}\h+\K\w+\h+\d{1,2},\h*\d{4}\b
Explanation:
?>...)
: is atomic group\b
: Word boundary(?>passed|died)
: Match passed
or died
(?>\h+\S+){0,3}
: Match 0 to 3 space separated substrings\h+
: Match 1+ whitespaces\K
: Resets matched info\w+
: Match month name\h+
: Match 1+ whitespaces\d{1,2}
: Match date part 1 or 2 digits,\h*
: Match comma followed by 0 or more whitespaces\d{4}\b
: Match 4 digit year followed by word boundaryUpvotes: 4