I have a scenario to use regex for validation. Here is text format which I need to validate is something like below: Valid Text name test +company abc def +phone 3434 +vehicle test + interested yyy +invited zzz Invalid Text: name te%st +company +phone 3434 +vehicle test + interested yyy +invited zzz Rules There should not be any other character in the text like % in above. Also the first word must follow a space and then there should be some text after that and then the + sign. Here is regular expression which I wrote: ^(([a-z]*[A-Z]*\s?)+(\w*\s*)*\+)*$ The problem I am facing is that when text is valid Regex.Match(text) returns true immediately. But when I add some other character inside the text which is not valid it takes too long and debugger never returns.

c#regexvalidation

Ahmad

Reputation: 887

Regex validation taking too long c#

I have a scenario to use regex for validation.

Here is text format which I need to validate is something like below:

Valid Text

name test +company abc def +phone 3434 +vehicle test + interested yyy +invited zzz

Invalid Text:

name te%st +company +phone 3434 +vehicle test + interested yyy +invited zzz

Rules

There should not be any other character in the text like % in above.
Also the first word must follow a space and then there should be some text after that and then the + sign.

Here is regular expression which I wrote:

^(([a-z]*[A-Z]*\s?)+(\w*\s*)*\+)*$

The problem I am facing is that when text is valid Regex.Match(text) returns true immediately. But when I add some other character inside the text which is not valid it takes too long and debugger never returns.

Upvotes: 1

Answers (3)

ΩmegaMan

Reputation: 31616

is not valid it takes too long and debugger never returns.

You are asking the parser to consider too many scenarios and it has to eliminate all of them before returning; hence the slowness.

Suggestion

Usage of * which means zero or more occurrences makes the regex parser re-think (backtrack) about other possible matches.

Think in terms of chess, there are literally millions of possible combinations. Using the * is like saying give me every move possible. But we only want the moves which are pertinent...same is true with regex pattern smithing; keep it to the minimums.

With the *, instead prefer to use the + if you truly know there will be 1 or more of the items and not zero. It keeps the backtracking to a minimum and makes for quicker parsing.
For your failure scenarions, instead of trying to match the world, why not fail a match by checking for invalids first? This can be done such as ^(?! ) pattern. So, your rule mentioned a failure for non characters found, so put this in first ^(?!.+%). That says if there is a % somewhere in the text, then fail the match.
Once #2 is done then just focus on a valid pattern(s) which give the best case scenario.

Your example data is problematic, but the in the spirit of what you want as a jumping off point I would begin with this pattern:

^(?!.+%)(\w+\s\w+\s\+\w+\s?)+

Which says fail on a %, then there should be 1 or more of a pattern (word space word space +word and possible space)

Upvotes: 1

BurnsBA

Reputation: 4929

Why not a simple parser? Split on the '+' character, then evaluate each phrase. I'm assuming the first word before the space is the key, and the remainder is the value. There's also a regex that checks for valid characters; non-alphanumeric will throw an exception.

var working = "name test +company abc def +phone 3434 +vehicle test + interested yyy +invited zzz";

if (System.Text.RegularExpressions.Regex.IsMatch(working, "[^a-zA-Z0-9 +]"))
{
    throw new InvalidOperationException();
}

var values = working.Split('+').Select(x => x?.Trim() ?? string.Empty);

foreach (var phrase in values)
{
    string left, right;

    var space = phrase.IndexOf(' ');
    if (space > 0)
    {
        left = phrase.Substring(0, space)?.Trim() ?? string.Empty;
        right = phrase.Substring(space + 1, phrase.Length - space - 1)?.Trim() ?? string.Empty;

        Console.WriteLine("left: [" + left + "], right: [" + right + "]");
    }
}

Console output:

left: [name], right: [test]
left: [company], right: [abc def]
left: [phone], right: [3434]
left: [vehicle], right: [test]
left: [interested], right: [yyy]
left: [invited], right: [zzz]

Running the above with an invalid character throws an exception:

var working = "na%me test +company abc def +phone 3434 +vehicle test + interested yyy +invited zzz";  

...

Operation is not valid due to the current state of the object.

Upvotes: 0

Tanveer Badar

Reputation: 5523

Instead of trying to come up with a "works always regex", why don't you rephrase the solution without regular expressions at all.

var text = "name test +company abc def +phone 3434 +vehicle test + interested yyy +invited zzz";
var parts = text.Split('+');
var matches = parts.All(p => 
{
   var kvp = p.Trim().Split(' ');
   if( kvp.Length != 2 )
       return false;
   return kvp[0].All(char.IsLetter) && kvp[1].All(char.IsLetterOrDigit);
});

Although this will cause too many allocations if you want to process large amounts of text but should be good otherwise.

Upvotes: 0

Regex validation taking too long c#

Rules

Answers (3)

Related Questions