David Metcalfe
David Metcalfe

Reputation: 2411

Matching years at end of string not enclosed in brackets

I'm trying to match strings that have a year at the end of them, but only when they're not enclosed in brackets. Negative lookaheads and lookbehinds don't seem to help.

Here's some example text. I only want the first two lines matched, and not the third.

Example one 2015
Example two 2017
Example three (2009)

If I use something like (?<!\(\d{4}\)$) or (?!\(\d{4}\)$) then I get 54 matches instead of the expected 2 (one for each of the first two lines).

screenshot of current regex results

What am I doing wrong?

Upvotes: 1

Views: 64

Answers (3)

Ryszard Czech
Ryszard Czech

Reputation: 18611

Use more or less current centuries:

\b(?:19|20)\d\d$

Or, any four digits as a whole word at the end of string:

\b\d{4}$

See proof.

Explanation

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    19                       '19'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    20                       '20'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  \d                       digits (0-9)
--------------------------------------------------------------------------------
  \d                       digits (0-9)
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
--------------------------------------------------------------------------------
  \b                      word boundary
--------------------------------------------------------------------------------
  \d{4}                    four digits

Upvotes: 2

rturrado
rturrado

Reputation: 8064

Try this:

^(.*[^\(]\d{4}[^\)]?)$

  • ^ Start of line
  • ( Start of capturing group
  • .* Anything zero or more times
  • [^\(] Anything but an opening parentheses
  • \d{4} Four digit date
  • [^\)]? Anything but a closing parentheses (optionally)
  • ) End of capturing group
  • $ End of line

https://regex101.com/r/zr2pfv/1

Upvotes: 2

BahmanM
BahmanM

Reputation: 1445

You could try matching on the next immediate character. For example:

\d{4}\s*$

This matches the lines containing exactly 4 digits as the last non-whitespace characters.

Upvotes: 2

Related Questions