user2647071
user2647071

Reputation:

How to get all possible regex matches in a string

I am attempting to write a program that will look for series of numbers in a string that could be interpreted as dates. To that end, I have written a regular expression which I run like this:

Dim m As MatchCollection = Regex.Matches(_string, "[0-9]{1,4}[-_ ]?[0-9]{1,2}([-_ ]?[0-9]{2,4})?")

Now, when I give it some weird string like "4_2_2012_13_39", I would expect it to return the following nine matches:

  1. 4_2
  2. 4_2_20
  3. 4_2_2012
  4. 2_20
  5. 2012
  6. 2012_13_39
  7. 12_13
  8. 12_13_39
  9. 13_39

(I have a secondary step that would discard numbers 6 and 9 for not having any number in the range for a month value.) In fact, I get only two matches: "4_20_2012" and "13_39". I think it's trying not to use the same character in two matches. Is there a way that I can insist that it not do so? Thank you for any help.

Rob

Upvotes: 1

Views: 616

Answers (2)

Martijn
Martijn

Reputation: 12102

It will give you the longest match for each start point, not every possible match (just as matching abcdef against .* will return the match abcdef, not all possible substring off it (so not a or f or bcd)

Upvotes: 0

paparazzo
paparazzo

Reputation: 45096

Why do you want strings that could be interpreted as dates but are not valid dates? 2012_13_39 is not a valid date.

You could run independent regex for each date type

This would look for 4 digit year starting with 19 or 20
The negative look back and look forward is to match on anything not a digit to identify a stand alone number

(?<!\d)(20|19)\d\d(?!\d) 

This is to look for month day

(?<!\d)1?\d_[1-3]?\d(?!\d)

but you could be even more restrictive as this allows 19/39

Year at beginning

(?<!\d)(20|19)\d\d_1?\d_[1-3]?\d(?!\d)

I am not going to build them all up for you but this is the tools to do it
(?!\d) should work as a boundary

Upvotes: 1

Related Questions