Reputation: 887
I have a scenario to use regex for validation.
Here is text format which I need to validate is something like below:
Valid Text
name test +company abc def +phone 3434 +vehicle test + interested yyy +invited zzz
Invalid Text:
name te%st +company +phone 3434 +vehicle test + interested yyy +invited zzz
Here is regular expression which I wrote:
^(([a-z]*[A-Z]*\s?)+(\w*\s*)*\+)*$
The problem I am facing is that when text is valid Regex.Match(text)
returns true immediately. But when I add some other character inside the text which is not valid it takes too long and debugger never returns.
Upvotes: 1
Views: 329
Reputation: 31616
is not valid it takes too long and debugger never returns.
You are asking the parser to consider too many scenarios and it has to eliminate all of them before returning; hence the slowness.
Suggestion
Usage of *
which means zero or more occurrences makes the regex parser re-think (backtrack) about other possible matches.
Think in terms of chess, there are literally millions of possible combinations. Using the *
is like saying give me every move possible. But we only want the moves which are pertinent...same is true with regex pattern smithing; keep it to the minimums.
With the *
, instead prefer to use the +
if you truly know there will be 1 or more of the items and not zero. It keeps the backtracking to a minimum and makes for quicker parsing.
For your failure scenarions, instead of trying to match the world, why not fail a match by checking for invalids first? This can be done such as ^(?! )
pattern. So, your rule mentioned a failure for non characters found, so put this in first ^(?!.+%)
. That says if there is a %
somewhere in the text, then fail the match.
Your example data is problematic, but the in the spirit of what you want as a jumping off point I would begin with this pattern:
^(?!.+%)(\w+\s\w+\s\+\w+\s?)+
Which says fail on a %
, then there should be 1 or more of a pattern (word space word space +
word and possible space)
Upvotes: 1
Reputation: 4929
Why not a simple parser? Split on the '+'
character, then evaluate each phrase. I'm assuming the first word before the space is the key, and the remainder is the value. There's also a regex that checks for valid characters; non-alphanumeric will throw an exception.
var working = "name test +company abc def +phone 3434 +vehicle test + interested yyy +invited zzz";
if (System.Text.RegularExpressions.Regex.IsMatch(working, "[^a-zA-Z0-9 +]"))
{
throw new InvalidOperationException();
}
var values = working.Split('+').Select(x => x?.Trim() ?? string.Empty);
foreach (var phrase in values)
{
string left, right;
var space = phrase.IndexOf(' ');
if (space > 0)
{
left = phrase.Substring(0, space)?.Trim() ?? string.Empty;
right = phrase.Substring(space + 1, phrase.Length - space - 1)?.Trim() ?? string.Empty;
Console.WriteLine("left: [" + left + "], right: [" + right + "]");
}
}
Console output:
left: [name], right: [test]
left: [company], right: [abc def]
left: [phone], right: [3434]
left: [vehicle], right: [test]
left: [interested], right: [yyy]
left: [invited], right: [zzz]
Running the above with an invalid character throws an exception:
var working = "na%me test +company abc def +phone 3434 +vehicle test + interested yyy +invited zzz";
...
Operation is not valid due to the current state of the object.
Upvotes: 0
Reputation: 5523
Instead of trying to come up with a "works always regex", why don't you rephrase the solution without regular expressions at all.
var text = "name test +company abc def +phone 3434 +vehicle test + interested yyy +invited zzz";
var parts = text.Split('+');
var matches = parts.All(p =>
{
var kvp = p.Trim().Split(' ');
if( kvp.Length != 2 )
return false;
return kvp[0].All(char.IsLetter) && kvp[1].All(char.IsLetterOrDigit);
});
Although this will cause too many allocations if you want to process large amounts of text but should be good otherwise.
Upvotes: 0