maraaaaaaaa
maraaaaaaaa

Reputation: 8173

Why does my RegEx get two different results depending on platform?

I have a RegEx pattern:

@"((?(?!\.\d)\D)*)(\d*\.\d+|\d+)*((?(?<=\d).*))"

designed to break a string into 3 parts. If i have the strings

"asdf1234asdf"
"asdf .1234asdf"
"asdf. .1234asdf"
"asdf 12.34asdf"
"asdf123.4 asdf"
"asdf.1234asdf"

I need:

1. "asdf"     2. "1234"    3. "asdf"
1. "asdf "    2. ".1234"   3. "asdf"
1. "asdf. "   2. ".1234"   3. "asdf"
1. "asdf "    2. "12.34"   3. "asdf"
1. "asdf"     2. "123.4"   3. " asdf"
1. "asdf"     2. ".1234"   3. "asdf"

But depending on the platform i use, the results change.

Regex101.com gives me the results i need

though in Regexstorm.com i have to modify the if statement in the Regex to a non-capturing group for it to work

I.e.: I need to change it from

@"((?(?!\.\d)\D)*)(\d*\.\d+|\d+)*((?(?<=\d).*))"

to

@"((?:(?!\.\d)\D)*)(\d*\.\d+|\d+)*((?(?<=\d).*))"

to get it to work in .NET

So why do i need to get rid of the 'if' block? does .NET not support if blocks?

Upvotes: 4

Views: 552

Answers (2)

user557597
user557597

Reputation:

Obviously Dot-Net doesn't do assertion conditionals correctly.
I wouldn't use these type of conditionals for anything.

Dot-Net does however do Expressional conditionals very well.
All you have to do is wrap any group of constructs inside a conditional group.

Example: (?( expressional construct ) .. | ..)

So, putting assertion inside there works just fine.


Note that Dot-Net is the only engine that supports expressional conditionals.
It's probably just as well that it is the only conditionals they do correctly.


Formatted:

 # @"((?((?!\.\d))\D)*)(\d*\.\d+|\d+)((?((?<=\d)).*))"

 (                             # (1 start)
      (?(
           (?! \. \d )
        )
           \D 
      )*
 )                             # (1 end)
 ( \d* \. \d+ | \d+ )          # (2)
 (                             # (3 start)
      (?(
           (?<= \d )
        )
           .* 
      )
 )                             # (3 end)

C#:

string [] sAAA = { 
    "asdf1234asdf",
    "asdf .1234asdf",
    "asdf. .1234asdf",
    "asdf 12.34asdf",
    "asdf123.4 asdf",
    "asdf.1234asdf",
    };
Regex RxAAA = new Regex(@"((?((?!\.\d))\D)*)(\d*\.\d+|\d+)((?((?<=\d)).*))");
for (int i = 0; i < sAAA.Length; i++)
{
    Match _mAAA = RxAAA.Match( sAAA[i] );
    if (_mAAA.Success)
    {
        Console.WriteLine("1. = \"{0}\", \t2. = \"{1}\", \t3. = \"{2}\"",
            _mAAA.Groups[1].Value, _mAAA.Groups[2].Value, _mAAA.Groups[3].Value );
    }
}

Output:

1. = "asdf",    2. = "1234",    3. = "asdf"
1. = "asdf ",   2. = ".1234",   3. = "asdf"
1. = "asdf. ",  2. = ".1234",   3. = "asdf"
1. = "asdf ",   2. = "12.34",   3. = "asdf"
1. = "asdf",    2. = "123.4",   3. = " asdf"
1. = "asdf",    2. = ".1234",   3. = "asdf"

Upvotes: 0

Sam
Sam

Reputation: 20486

RegEx is more similar to English than it is to C#. It's a language used to define patterns which will find matches within strings. Every language needs to implement their regular expression engine and therefore there are differences between most, while the concepts stay mostly the same. Usually, the more complicated the expression the more likely it isn't cross-platform compatible. That's why everyone will ask SO users what programming language they use when a vague RegEx question is asked.

This is why tools like RegEx101 need to have multiple "flavors" for testing an expression thoroughly. You'll also notice the "Quick Reference" content (cheat sheet containing tokens, quantifiers, etc.) changes as you change between engines.

Wikipedia: Comparison of regular expression engines.

Upvotes: 5

Related Questions