Reputation: 31
Given the following string
2010-01-01XD2010-01-02XX2010-01-03NX2010-01-04XD2010-01-05DN
I am trying to find all instances of the date followed by one or two characters ie 2010-01-01XD
but not where the characters are XX
I have tried
(2010-01-02[^X]{2})|(2010-01-08[^X]{2})|(2010-01-07[^X]{2})|(2010-01-05[^X]{2})|(2010-01-15[^X]{2})
this works if both chars are not X
. I have also tried
(2010-01-02[^X]{1,2})|(2010-01-08[^X]{1,2})|(2010-01-07[^X]{1,2})|(2010-01-05[^X]{1,2})|(2010-01-15[^X]{1,2})
this works for for DX
but not XD
So trying to be a little clearer
2010-01-01XD
2010-01-01DX
2010-01-01ND
All above should be picked up
2010-01-01XX
And this ignored
Upvotes: 0
Views: 131
Reputation: 425258
A negative look ahead is the easiest way to assert the letters not being XX
, but there are some simplifications you can make to the alternation by recognising the parts of the date shared by all dates you're trying to match, making this shorter regex:
2010-01-(02|08|07|05|15)(?!XX)[A-Z]{1,2}
Upvotes: 0
Reputation: 47274
You could likely use a simple pattern with a negtive lookahead such as this:
\d{4}-\d{2}-\d{2}(?!XX)[A-Z]{1,2}
example: http://regex101.com/r/dI1nW4/2
To allow Unicode characters (with the exception of XX) you could use:
\d{4}-\d{2}-\d{2}(?!XX)\D{1,2}
example: http://regex101.com/r/yB5fI0/1
Upvotes: 2
Reputation:
Easiest way is to use a lookahead assertion (if available).
# (2010-01-01|2010-01-02|2010-01-08|2010-01-07|2010-01-05|2010-01-15)(?!XX)(?i:([a-z]){1,2})
( # (1 start), One of these dates
2010-01-01
| 2010-01-02
| 2010-01-08
| 2010-01-07
| 2010-01-05
| 2010-01-15
) # (1 end)
(?! XX ) # Look ahead assertion, cannot match XX here
(?i: # 1 or 2 of any U/L case letter
( [a-z] ){1,2} # (2)
)
Upvotes: 2
Reputation: 785771
You can use this regex based on negative lookahead:
(20[0-9]{2}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12][0-9]|3[01])(?!XX)[A-Z]{2})
Upvotes: 2
Reputation: 41968
20[0-9]{2}-[01][0-9]-[0-3][0-9]([A-Z][A-WYZ]|[A-WYZ][A-Z])
Upvotes: 1