Tim Dodgson
Tim Dodgson

Reputation: 31

Regex find characters in string

Given the following string

2010-01-01XD2010-01-02XX2010-01-03NX2010-01-04XD2010-01-05DN

I am trying to find all instances of the date followed by one or two characters ie 2010-01-01XD but not where the characters are XX

I have tried

(2010-01-02[^X]{2})|(2010-01-08[^X]{2})|(2010-01-07[^X]{2})|(2010-01-05[^X]{2})|(2010-01-15[^X]{2})

this works if both chars are not X. I have also tried

(2010-01-02[^X]{1,2})|(2010-01-08[^X]{1,2})|(2010-01-07[^X]{1,2})|(2010-01-05[^X]{1,2})|(2010-01-15[^X]{1,2})

this works for for DX but not XD

So trying to be a little clearer

2010-01-01XD
2010-01-01DX
2010-01-01ND

All above should be picked up

2010-01-01XX

And this ignored

Upvotes: 0

Views: 131

Answers (5)

Bohemian
Bohemian

Reputation: 425258

A negative look ahead is the easiest way to assert the letters not being XX, but there are some simplifications you can make to the alternation by recognising the parts of the date shared by all dates you're trying to match, making this shorter regex:

2010-01-(02|08|07|05|15)(?!XX)[A-Z]{1,2}

Upvotes: 0

l'L'l
l'L'l

Reputation: 47274

You could likely use a simple pattern with a negtive lookahead such as this:

\d{4}-\d{2}-\d{2}(?!XX)[A-Z]{1,2}

example: http://regex101.com/r/dI1nW4/2

To allow Unicode characters (with the exception of XX) you could use:

\d{4}-\d{2}-\d{2}(?!XX)\D{1,2}

example: http://regex101.com/r/yB5fI0/1

Upvotes: 2

user557597
user557597

Reputation:

Easiest way is to use a lookahead assertion (if available).

 # (2010-01-01|2010-01-02|2010-01-08|2010-01-07|2010-01-05|2010-01-15)(?!XX)(?i:([a-z]){1,2})

 (                        # (1 start), One of these dates
      2010-01-01
   |  2010-01-02
   |  2010-01-08
   |  2010-01-07
   |  2010-01-05
   |  2010-01-15
 )                        # (1 end)
 (?! XX )                 # Look ahead assertion, cannot match XX here
 (?i:                     # 1 or 2 of any U/L case letter
      ( [a-z] ){1,2}           # (2)
 )

Upvotes: 2

anubhava
anubhava

Reputation: 785771

You can use this regex based on negative lookahead:

(20[0-9]{2}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12][0-9]|3[01])(?!XX)[A-Z]{2})

RegEx Demo

Upvotes: 2

Niels Keurentjes
Niels Keurentjes

Reputation: 41968

20[0-9]{2}-[01][0-9]-[0-3][0-9]([A-Z][A-WYZ]|[A-WYZ][A-Z])

See it in action.

Upvotes: 1

Related Questions