Reputation: 2157
I have date
in my .txt
file which comes like either of the below:
mmddyyyy
OR
mm/dd/yyyy
Below is the regex which works fine for mm/dd/yyyy
.
^02\/(?:[01]\d|2\d)\/(?:19|20)(?:0[048]|[13579][26]|[2468][048])|(?:0[13578]|10|12)\/(?:[0-2]\d|3[01])\/(?:19|20)\d{2}|(?:0[469]|11)\/(?:[0-2]\d|30)\/(?:19|20)\d{2}|02\/(?:[0-1]\d|2[0-8])\/(?:19|20)\d{2}$
However, unable to build the regex for mmddyyyy
. I just want to understand is there any generic regex that would work for both cases?
Upvotes: 0
Views: 168
Reputation: 13922
Why use regex for this? Seems like a case of "Now you have two problems"
It would be more effective (and easier to understand) to use a DateTimeFormatter
(assuming you are on the JVM and not using scala-js)
The format patterns support using []
to surround optional sections, such as the /
, and the formatters inherently perform input validation so if you plug in a month or day that can't exist, it'll throw an exception.
import java.time.format.DateTimeFormatter
import java.time.LocalDate
val mdy = DateTimeFormatter.ofPattern("MM[/]dd[/]yyyy")
def parse(rawDate: String) = LocalDate.parse(rawDate, mdy)
scala> parse("12252022")
res7: java.time.LocalDate = 2022-12-25
scala> parse("12/25/2022")
res8: java.time.LocalDate = 2022-12-25
scala> parse("25/12/2022")
java.time.format.DateTimeParseException: Text '25/12/2022' could not be parsed: Invalid value for MonthOfYear (valid values 1 - 12): 25
scala> parse("abc123")
java.time.format.DateTimeParseException: Text 'abc123' could not be parsed at index 0
Upvotes: 2
Reputation: 350841
First of all, there is a tiny shortcoming in your regex: the ^
anchor only applies to the first part of your regex, not to the other alternatives that are separated by |
. Similarly the final $
applies only to the final alternative. You should put all alternatives in a non-capturing group, like ^(?: | | | )$
Then for the question itself, you could make the forward slash that follows the month optional and put it in a capture group. Then what comes between the day and the year could be a backreference to that capture group. So (\/?)
and \1
.
^(?:02(\/?)(?:[01]\d|2\d)\1(?:19|20)(?:0[048]|[13579][26]|[2468][048])|(?:0[13578]|10|12)(\/?)(?:[0-2]\d|3[01])\2(?:19|20)\d{2}|(?:0[469]|11)(\/?)(?:[0-2]\d|30)\3(?:19|20)\d{2}|02(\/?)(?:[0-1]\d|2[0-8])\4(?:19|20)\d{2})$
Upvotes: 0
Reputation: 163477
If you want to match all those variations with either 2 forward slashes or only digits, you can use a positive lookahead to assert either only digits or 2 forward slashes surrounded by digits.
Then in the pattern itself you can make matching the /
optional.
Note that you don't have to escape the \/
^(?=\d+(?:/\d+/\d+)?$)(?:02/?(?:[01]\d|2\d)/?(?:19|20)(?:0[048]|[13579][26]|[2468][048])|(?:0[13578]|10|12)/?(?:[0-2]\d|3[01])/?(?:19|20)\d{2}|(?:0[469]|11)/?(?:[0-2]\d|30)/?(?:19|20)\d{2}|02/?(?:[0-1]\d|2[0-8])\?(?:19|20)\d{2})$
Another option is to write an alternation |
matching the same pattern without the /
in it.
Upvotes: 1