Reputation: 62384
I'm trying to extract dates and event numbers from a section of a web page. Here's the regex I'm trying to use: Event \d+ begins (.+?) ((Sun|Mon|Tue|Wed|Thu|Fri|Sat).+?) PST|PDT
Event 4 begins for small business owners on Thursday, July 20, at 5:00 p.m. PDT in North America.
The key information I want here is the date for the correct event which in this case is Thursday, July 20, at 5:00 p.m.
.
What about this regex is causing it not to match this date? I've been through this several times and am not seeing it and need a second pair of eyes.
Here's a regex101 example: https://regex101.com/r/oJyLld/3/
Upvotes: 1
Views: 58
Reputation: 521239
The only major problem I found with your regex was at the end:
PST|PDT
Your pattern actually says to match the line of interest ending in PST
or the isolated string PDT
. If you intend to match either PST
or PDT
the alternation should be wrapped in parenthesed as you did with the day abbreviations. The following pattern seems to work:
Event \d+ begins (.+?) ((Sun|Mon|Tue|Wed|Thu|Fri|Sat).+?) (PST|PDT)
Actually we can further improve upon the above:
Event \d+ begins (.+?) (?:Sun|Mon|Tue|Wed|Thu|Fri|Sat).+? (?:PST|PDT)
This second version of your regex is an improvement because it does not capture the alternations (since you apparently do not need them). This means that the regex engine might be able to match faster. I also tried to unnest some of what your originally wrote.
Upvotes: 2