Reputation:
I am attempting to write a program that will look for series of numbers in a string that could be interpreted as dates. To that end, I have written a regular expression which I run like this:
Dim m As MatchCollection = Regex.Matches(_string, "[0-9]{1,4}[-_ ]?[0-9]{1,2}([-_ ]?[0-9]{2,4})?")
Now, when I give it some weird string like "4_2_2012_13_39", I would expect it to return the following nine matches:
(I have a secondary step that would discard numbers 6 and 9 for not having any number in the range for a month value.) In fact, I get only two matches: "4_20_2012" and "13_39". I think it's trying not to use the same character in two matches. Is there a way that I can insist that it not do so? Thank you for any help.
Rob
Upvotes: 1
Views: 616
Reputation: 12102
It will give you the longest match for each start point, not every possible match (just as matching abcdef
against .*
will return the match abcdef
, not all possible substring off it (so not a
or f
or bcd
)
Upvotes: 0
Reputation: 45096
Why do you want strings that could be interpreted as dates but are not valid dates? 2012_13_39 is not a valid date.
You could run independent regex for each date type
This would look for 4 digit year starting with 19 or 20
The negative look back and look forward is to match on anything not a digit to identify a stand alone number
(?<!\d)(20|19)\d\d(?!\d)
This is to look for month day
(?<!\d)1?\d_[1-3]?\d(?!\d)
but you could be even more restrictive as this allows 19/39
Year at beginning
(?<!\d)(20|19)\d\d_1?\d_[1-3]?\d(?!\d)
I am not going to build them all up for you but this is the tools to do it
(?!\d) should work as a boundary
Upvotes: 1