Can you convert C++ std::regex expressions to C# expressions?

Question

I have C# code what finding matches by regex and replace it by i#. But looks like C++ expressions don't work in the same way with C#. Please convert it or give some tips.

I work in Visual Studio Express 2012. Looks like \ is needed to C# regex too.

Expressions:

//letter + possible letters or numbers without numbers before first letter    
"(?:^|[^\d])\b([a-zA-Z][a-zA-Z\d]*)" 
//exponencial number like 1.10E+5
"\d(\.?\d+)?E[+-]\d+" 
//next two is pretty obvious
"\d+\.\d+" 
"\d+"

C# code:

string input = "FGS1=(B+A*(5.01E+10))+A*10+(C*10.5)*51E-10+5.01E+10";
Regex r = new Regex(rExp);
var identifiers = new Dictionary();

MatchEvaluator me = delegate(Match m)
{
    Console.WriteLine(m);
    var variableName = m.ToString();

    if (identifiers.ContainsKey(variableName))
    {
        return identifiers[variableName];
    }
    else
    {
        i++;
        var newVariableName = "i" + i.ToString();
        identifiers[variableName] = newVariableName;
        return newVariableName;
    }
};

input = r.Replace(input, me);

t3chb0t · Accepted Answer

Yes and no. You don't have to convert the regular expressions from std::regex into C#. All you need is to tell C# to use a different behavior. Here's why and how:

in C#:

By default, the regular expression engine uses canonical behavior when matching a regular expression pattern to input text.

Regular Expression Options

the std::regex on the contrary:

By default, the functions in this library use the ECMAScript grammar.

C++ Reference - regex

To make the std::regex expression work in C# you need to use the RegexOptions Enumeration and set the ECMAScript option:

new Regex(pattern, RegexOptions.ECMAScript | RegexOptions.IgnoreCase);

Enables ECMAScript-compliant behavior for the expression. This value can be used only in conjunction with the IgnoreCase, Multiline, and Compiled values. The use of this value with any other values results in an exception.

ECMAScript vs. Canonical Matching Behavior

The behavior of ECMAScript and canonical regular expressions differs in three areas:

Character classes are specified differently in matching expressions. Canonical regular expressions support Unicode character categories by default. ECMAScript does not support Unicode.

A regular expression capture class with a backreference to itself must be updated with each capture iteration.

Ambiguities between octal escapes and backreferences are treated differently.

UPDATE:

In some comments it's suggested to use the verbatim string in C# (and not escaping everything). The truth is that this wouldn't work because:

Regular expression processing (with std:regex) is not as convenient in C++ as it is in languages such as Perl that have built-in regular expression support. One reason is escape sequences. To send a backslash \ to the regular expression engine, you have to type \ in the source code. For example, consider these definitions.

C++ TR1 regular expressions

So the patterns as defined by the OP are correct.

Example:

C#:

var input = "123";
var pattern = "\d";
var result1 = Regex.Replace(input, pattern, "_", RegexOptions.ECMAScript); // produces "___"
var result2 = RegexTest.Replace(input, pattern, "_"); // produces "___"

C++/CLI:

String^ RegexTest::Replace(String^ input, String^ pattern, String^ replacement) {
    using namespace Runtime::InteropServices;
    const char* p_input = (const char*)(Marshal::StringToHGlobalAnsi(input)).ToPointer();
    const char* p_pattern = (const char*)(Marshal::StringToHGlobalAnsi(pattern)).ToPointer();
    const char* p_replacement = (const char*)(Marshal::StringToHGlobalAnsi(replacement)).ToPointer();

    try {
        std::string _input(p_input);
        std::string _replacement(p_replacement);
        std::regex re = std::regex(p_pattern);

        std::string result = std::regex_replace(_input, re, _replacement);
        return gcnew String(result.c_str());
    } finally {
        Marshal::FreeHGlobal(IntPtr((void*)p_pattern));
        Marshal::FreeHGlobal(IntPtr((void*)p_input));
        Marshal::FreeHGlobal(IntPtr((void*)p_replacement));
    }
}

Can you convert C++ std::regex expressions to C# expressions?

Answers (1)

Related Questions