Reputation: 148
I am trying to use regex for matching date(from 2000-2099). the following regex is perfectly okay.
((((^20[02468][048])|(^20[13579][26]))-(((0[13578]|1[02])-(0[1-9]|[12]\d|3[0-1]$))|((0[469]|11)-(0[1-9]|[12]\d|30$))|(02-(0[1-9]|1\d|2[0-9]$)))|((^20\d{2})-(((0[13578]|1[02])-(0[1-9]|[12]\d|3[01]$))|((0[469]|11)-(0[1-9]|[12]\d|30$))|(02-(0[1-9]|1\d|2[0-8]$)))))|0000-00-00){1}
note: match leap year(#days in month: 31, 30, 29
) or normal year(#days in month: 31, 30, 28
) or default(0000-00-00)
however it matches empty string too. I tried to search for the solution like this one but mine is way more complex and I tried to add {1,}
like the suggestion stated in the link.
But it doesn't work.
and actually i don't understand why it matches empty string as well, could anyone please tell me too?
Upvotes: 1
Views: 256
Reputation: 753535
I don't have a direct answer to the problem with the empty string being accepted. I don't think the trouble is in the regex — neither the original nor the revised version. I think the suggestion by dtanders is probably on track; your comments support that.
However, I think there is room to simplify and improve your regex.
There are a lot of unnecessary parentheses in the regex, and the logic seems a bit convoluted. You check for 0000-00-00 (actually the last option, but by far the shortest to describe), or for 'any valid date in any leap year' or for 'any valid date in a non-leap year'. That leads to a major repeated chunk of regex for validating all the invariant months.
There'd be less repetition if you restructured your code to test for 0000-00-00 or any valid day in any month or any valid leap day.
In Perl, you can write extended regular expressions where spaces aren't significant, and the regex can be spread over many lines to make it easier to understand. This leads to a test script like:
#!/usr/bin/env perl
use strict;
use warnings;
my $rx = qr/
^(
20\d{2} -
( (0[13578] | 1[02]) - (0[1-9] | [12]\d | 3[01])
| ((0[469] | 11) - (0[1-9] | [12]\d | 30))
| (02 - (0[1-9] | 1\d | 2[0-8]))
)
| (^20[02468][048] | ^20[13579][26]) - 02 - 29 # Leap day
| 0000-00-00
)$
/x;
while (<>)
{
chomp;
printf "%s: %s\n", (m/$rx/ ? "PASS" : "FAIL"), $_;
}
If the regex is flattened onto a single line (and the comment removed), then you get:
^(20\d{2}-((0[13578]|1[02])-(0[1-9]|[12]\d|3[01])|((0[469]|11)-(0[1-9]|[12]\d|30))|(02-(0[1-9]|1\d|2[0-8])))|(20[02468][048]|20[13579][26])-02-29|0000-00-00)$
The original regex occupies 276 characters. The revision occupies 158 when flattened.
I called the script regex-hell
and created a file various-dates
with various sample dates in it. The output was:
PASS: 0000-00-00
FAIL: 0001-00-00
FAIL: 0000-01-00
FAIL: 0000-00-01
FAIL: 2000-00-00
FAIL: 2000-01-00
FAIL: 2000-00-01
PASS: 2000-01-01
PASS: 2000-02-28
PASS: 2000-02-29
PASS: 2001-02-28
FAIL: 2001-02-29
PASS: 2003-03-31
FAIL: 2003-03-32
PASS: 2004-04-30
FAIL: 2004-04-31
PASS: 2005-05-31
FAIL: 2005-05-32
FAIL: 2005-05-00
PASS: 2005-05-01
PASS: 2006-06-30
FAIL: 2006-06-31
PASS: 2007-07-31
FAIL: 2007-07-32
PASS: 2008-08-31
FAIL: 2008-08-32
PASS: 2009-09-30
FAIL: 2009-09-31
FAIL: 2009-09-32
PASS: 2010-10-30
PASS: 2010-10-31
FAIL: 2010-10-32
PASS: 2011-11-30
FAIL: 2011-11-31
PASS: 2012-12-31
FAIL: 2012-12-32
PASS: 2099-01-01
PASS: 2099-12-31
FAIL:
Upvotes: 0
Reputation: 424983
I don't think it matches the empty string, and neither does Rubular, but whatever the case, add an anchored negative look-ahead for end of input ^(?!$)
to your regex to prevent a blank from matching:
^(?!$)((((^20[02468][048])|(^20[13579][26]))-(((0[13578]|1[02])-(0[1-9]|[12]\d|3[0-1]$))|((0[469]|11)-(0[1-9]|[12]\d|30$))|(02-(0[1-9]|1\d|2[0-9]$)))|((^20\d{2})-(((0[13578]|1[02])-(0[1-9]|[12]\d|3[01]$))|((0[469]|11)-(0[1-9]|[12]\d|30$))|(02-(0[1-9]|1\d|2[0-8]$)))))|0000-00-00){1}
Upvotes: 0
Reputation: 1845
Add the required attribute to the input:
<input pattern="(((^20[02468][048])|(^20[13579][26]))-(((0[13578]|1[02])-(0[1-9]|[12]\d|3[0-1]$))|((0[469]|11)-(0[1-9]|[12]\d|30$))|(02-(0[1-9]|1\d|2[0-9]$)))|((^20\d{2})-(((0[13578]|1[02])-(0[1-9]|[12]\d|3[01]$))|((0[469]|11)-(0[1-9]|[12]\d|30$))|(02-(0[1-9]|1\d|2[0-8]$)))))|0000-00-00"
type="text"
required/>
The browser won't try to validate an empty input that doesn't have a required attribute.
http://jsfiddle.net/kyaLhqpu/ vs http://jsfiddle.net/kyaLhqpu/1/
Upvotes: 1
Reputation: 52185
It might make more sense to extract that regular expression from the input
tag and simply create a Javascript function whose aim is to validate the input.
The validation will be two fold:
Note though, the usage of Javascript could have the added advantage where you use actual mathematical operators such as <
, >
and =
to perform numeric range validation, as opposed to what you are doing now. The end result should be easier to understand and change should the need arise in the future.
Upvotes: 0