whatsinthename
whatsinthename

Reputation: 2157

Regex pattern for mm/dd/yyyy and mmddyyyy in Scala

I have date in my .txt file which comes like either of the below:

mmddyyyy

OR

mm/dd/yyyy

Below is the regex which works fine for mm/dd/yyyy.

^02\/(?:[01]\d|2\d)\/(?:19|20)(?:0[048]|[13579][26]|[2468][048])|(?:0[13578]|10|12)\/(?:[0-2]\d|3[01])\/(?:19|20)\d{2}|(?:0[469]|11)\/(?:[0-2]\d|30)\/(?:19|20)\d{2}|02\/(?:[0-1]\d|2[0-8])\/(?:19|20)\d{2}$

However, unable to build the regex for mmddyyyy. I just want to understand is there any generic regex that would work for both cases?

Upvotes: 0

Views: 168

Answers (3)

Dylan
Dylan

Reputation: 13922

Why use regex for this? Seems like a case of "Now you have two problems"

It would be more effective (and easier to understand) to use a DateTimeFormatter (assuming you are on the JVM and not using scala-js)

The format patterns support using [] to surround optional sections, such as the /, and the formatters inherently perform input validation so if you plug in a month or day that can't exist, it'll throw an exception.

import java.time.format.DateTimeFormatter
import java.time.LocalDate

val mdy = DateTimeFormatter.ofPattern("MM[/]dd[/]yyyy")
def parse(rawDate: String) = LocalDate.parse(rawDate, mdy)
scala> parse("12252022")
res7: java.time.LocalDate = 2022-12-25

scala> parse("12/25/2022")
res8: java.time.LocalDate = 2022-12-25

scala> parse("25/12/2022")
java.time.format.DateTimeParseException: Text '25/12/2022' could not be parsed: Invalid value for MonthOfYear (valid values 1 - 12): 25

scala> parse("abc123")
java.time.format.DateTimeParseException: Text 'abc123' could not be parsed at index 0

Upvotes: 2

trincot
trincot

Reputation: 350841

First of all, there is a tiny shortcoming in your regex: the ^ anchor only applies to the first part of your regex, not to the other alternatives that are separated by |. Similarly the final $ applies only to the final alternative. You should put all alternatives in a non-capturing group, like ^(?: | | | )$

Then for the question itself, you could make the forward slash that follows the month optional and put it in a capture group. Then what comes between the day and the year could be a backreference to that capture group. So (\/?) and \1.

^(?:02(\/?)(?:[01]\d|2\d)\1(?:19|20)(?:0[048]|[13579][26]|[2468][048])|(?:0[13578]|10|12)(\/?)(?:[0-2]\d|3[01])\2(?:19|20)\d{2}|(?:0[469]|11)(\/?)(?:[0-2]\d|30)\3(?:19|20)\d{2}|02(\/?)(?:[0-1]\d|2[0-8])\4(?:19|20)\d{2})$

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163477

If you want to match all those variations with either 2 forward slashes or only digits, you can use a positive lookahead to assert either only digits or 2 forward slashes surrounded by digits.

Then in the pattern itself you can make matching the / optional.

Note that you don't have to escape the \/

^(?=\d+(?:/\d+/\d+)?$)(?:02/?(?:[01]\d|2\d)/?(?:19|20)(?:0[048]|[13579][26]|[2468][048])|(?:0[13578]|10|12)/?(?:[0-2]\d|3[01])/?(?:19|20)\d{2}|(?:0[469]|11)/?(?:[0-2]\d|30)/?(?:19|20)\d{2}|02/?(?:[0-1]\d|2[0-8])\?(?:19|20)\d{2})$

Regex demo

Another option is to write an alternation | matching the same pattern without the / in it.

Upvotes: 1

Related Questions