Ben
Ben

Reputation: 62384

Why won't regex match expected result

I'm trying to extract dates and event numbers from a section of a web page. Here's the regex I'm trying to use: Event \d+ begins (.+?) ((Sun|Mon|Tue|Wed|Thu|Fri|Sat).+?) PST|PDT

Event 4 begins for small business owners on Thursday, July 20, at 5:00 p.m. PDT in North America.

The key information I want here is the date for the correct event which in this case is Thursday, July 20, at 5:00 p.m..

What about this regex is causing it not to match this date? I've been through this several times and am not seeing it and need a second pair of eyes.

Here's a regex101 example: https://regex101.com/r/oJyLld/3/

Upvotes: 1

Views: 58

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521239

The only major problem I found with your regex was at the end:

PST|PDT

Your pattern actually says to match the line of interest ending in PST or the isolated string PDT. If you intend to match either PST or PDT the alternation should be wrapped in parenthesed as you did with the day abbreviations. The following pattern seems to work:

Event \d+ begins (.+?) ((Sun|Mon|Tue|Wed|Thu|Fri|Sat).+?) (PST|PDT)

Actually we can further improve upon the above:

Event \d+ begins (.+?) (?:Sun|Mon|Tue|Wed|Thu|Fri|Sat).+? (?:PST|PDT)

This second version of your regex is an improvement because it does not capture the alternations (since you apparently do not need them). This means that the regex engine might be able to match faster. I also tried to unnest some of what your originally wrote.

Demo

Upvotes: 2

Related Questions