Reputation: 15
I'm working on a Date parser (Tenor parser exactly) where I have to extract dates set as a sum of number of days, weeks, months and years.
This would look like 1d1y or 100m2w or 1y1d1m1w.
I've built a custom parser but I'm looking for a cleaner solution using Regex. I have to check that it contains at most once of each date characters (d, w, m and y) and separated with an integer.
^(?<ValueDay>[0-9]+(d))?(?<ValueWeek>[0-9]+(w))?(?<ValueMonth>[0-9]+(m))?(?<ValueYear>[0-9]+(y))?$
The problem I have is that it could happen in any order (1d1w works as well as 1w1d). I tried to use the positive lookahead (?=) as follow but it doesn't match all the criterias.
^(?=.*(?<ValueDay>[0-9]+(d)))?(?=.*(?<ValueWeek>[0-9]+(w)))?(?=.*(?<ValueMonth>[0-9]+(m)))?(?=.*(?<ValueYear>[0-9]+(y)))?.*$
How could I do this?
Upvotes: 0
Views: 228
Reputation: 6103
Regex r = new Regex(@"^(\d+)([wydm])(?!.*\2)"
+ @"(?:(\d+)([wydm])(?!.*\4))?"
+ @"(?:(\d+)([wydm])(?!.*\6))?"
+ @"(?:(\d+)([wydm]))?$");
This should work. It matches \d+[wydm]
at least once and at most four times. Above that, when a character [wydm]
is matched, it looks ahead and the same character shall not occur in the text the second time. Now you can get the values from groups:
int GetValue(Match m)
{
int GetGroupValue(Group numberGroup, Group characterGroup)
{
if (!numberGroup.Success) { return 0; }
int number = int.Parse(numberGroup.Value);
switch (characterGroup.Value)
{
case "d": return number;
case "w": return 7 * number;
case "m": return 31 * number;
case "y": return 365 * number;
default: throw new NotSupportedException(characterGroup.Value + " is not supported");
}
}
return GetGroupValue(m.Groups[1], m.Groups[2])
+ GetGroupValue(m.Groups[3], m.Groups[4])
+ GetGroupValue(m.Groups[5], m.Groups[6])
+ GetGroupValue(m.Groups[7], m.Groups[8]);
}
Here are some tests to verify the corectness:
var tests = new (string s, bool isOk, int desiredValue)[] {
("15y", true, 15*365),
("1000w", true, 1000*7),
("10000d", true, 10000),
("100000m", true, 100000*31),
("15y8y", false, 0),
("", false, 0),
("7y9w12m2d", true, 7*365 + 9*7 + 12*31 + 2),
("7d9m12w2y", true, 7 + 9*31 + 12*7 + 2*365),
("7y9w12m2dd", false, 0),
("7y9w12m2y", false, 0),
("7y9w12m2x", false, 0),
("-5y", false, 0),
("1", false, 0),
("y2", false, 0),
("yd", false, 0),
("7y1", false, 0),
("m5d", false, 0)
};
foreach (var test in tests)
{
Match m = r.Match(test.s);
if (m.Success != test.isOk)
{
throw new Exception("Test failed for " + test.Item1);
}
if (GetValue(m) != test.desiredValue)
{
throw new Exception("Test failed for " + test.Item1);
}
}
MessageBox.Show("All " + tests.Count() + " tests passed");
Upvotes: 0
Reputation: 50104
If each group must occur zero or one time, you can use the following:
^
(
(?(y)(?!)|(?<y>\d+)y)
|
(?(m)(?!)|(?<m>\d+)m)
|
(?(w)(?!)|(?<w>\d+)w)
|
(?(d)(?!)|(?<d>\d+)d)
)+
$
For each letter, it checks whether the group with that letter as a name has a match already. If so, it fails and moves onto the next letter. If not, it tries to capture digits followed by that letter, into the group with that letter as the name.
Previous answer - everything-happens-exactly-once version:
^((?<y>\d+)y|(?<m>\d+)m|(?<w>\d+)w|(?<d>\d+)d){4}$(?<-y>)(?<-m>)(?<-w>)(?<-d>)
This checks for:
y
, m
, w
or d
y
was matchedm
was matchedw
was matchedd
was matchedUpvotes: 1