Hugo Cantacuzene
Hugo Cantacuzene

Reputation: 85

Regexp and non-recording groups in C#

I am having an issue with regexp in C#. I've been using theses patterns in F# and it works fine, so i don't understand why it would'nt work in C#.

So let's say i've got a muline input file. I need to parse this file for specific data:

Exemple:

    Lorem ipsum dolor sit amet, consectetur adipiscing elit (Token1 : 42)
    Aliquam id ante ut ante tempus fringilla Token2 (ante ut ) : 45
    Morbi varius adipiscing lacus, eget pellentesque tellus vulputate Token3 :  43

I basicaly need to retrieve the numbers written after Token1, Token2 , Token3 in a single match ( ie just want my number as a result). The patterns i used in F# are the following ones:

PatternToken1 = "(?:Token1 : )(\d+)"
PatternToken2 = "(?:Token2.* : )(\d+)"
PatternToken3 = "(?:Token3 : )(\d+)"

So my issue is the following one : pattern matching my input string in F# would give me the following results:

 MatchedToken1 = 42
 MatchedToken2 = 45
 MatchedToken3 = 43

In C# i would get the following results:

 MatchedToken1 = Token1 : 42
 MatchedToken2 = Token2 (ante ut ) : 45
 MatchedToken3 = Token3 :  43

How come this works in F# and not in C# ? What kind of pattern must i use for it to work in C#?

EDIT: Here is the code i use to match my patterns in c#:

 abstract class  PatternMatcherBaseEntity<T>
{
    protected Regex Pattern;
    protected T Match;


    private static TK Convert<TK>(string input)
    {
        TK res=default(TK);
        var converter = TypeDescriptor.GetConverter(typeof(TK));
        if(converter != null)
        {
            try
            {
                res = (TK) converter.ConvertFromString(input);
            }
            catch (Exception)
            {
                res = default(TK);
            }

        }
        return res;
    }


    protected bool Matcher(string s)
    {
        var res = false;
        //var matchedData = Regex.Match(s, Patterm);
        var content = Pattern.Matches(s);
        if(content.Count>0)
        {
            //Match = Convert<T>(content.Value);
            Match = Convert<T>(content[0].Value);
            res = true;
        }
        return res;
    }

    public T MatchGetter(String stringToMatch)
    {
        T ret = default(T);
        if(stringToMatch != String.Empty)
        {
            ret = stringToMatch.Match()
            .With(Matcher, x => Match)
            .Else(x => default(T))
            .Do();
        }
        return ret;
    }
}

by the way i've tested using verbatim strings and escape string. It would not compile otherwise

Upvotes: 0

Views: 182

Answers (4)

Brian
Brian

Reputation: 2778

In C# you'll want (?:) to match but not include in the match result:

Regex.Match(str, @"(?:Token1) : (\d+)"); // result =  42
Regex.Match(str, @"(?:Token2).* : )(\d+)"); // result = 45
Regex.Match(str, @"(?:Token3).+:.+\d+"); // result = 43

EDIT - accidentally had a stray paran in there - thanks to commenter for pointing out - also, fully misunderstood the point of the question ... thought the OP wanted to get the match with the word - funny part is I even open the question with "to match but not include". Not sure what I was thinking - anyways, new code and this time copy/paste to avoid extra paran ...

    string str = "Lorem ipsum dolor sit amet, consectetur adipiscing elit (Token1 : 42)      Aliquam id ante ut ante tempus fringilla Token2 (ante ut ) : 45      Morbi varius adipiscing lacus, eget pellentesque tellus vulputate Token3 :  43  ";
    Match m1 = Regex.Match(str, @"(?<=Token1 : +)\d+");
    Match m2 = Regex.Match(str, @"(?<=Token2.* : +)\d+");
    Match m3 = Regex.Match(str, @"(?<=Token3 : +)\d+");
    MatchCollection mAll = Regex.Matches(str, @"(?<=Token\d[^\:]+: +)\d+");

Upvotes: 0

stema
stema

Reputation: 93026

(?:Token1 : )(\d+)
             ^   ^

With this brackets you are creating a capturing group that puts the match from within the brackets in a capturing group.

You uses it now like this

var content = Pattern.Matches(s);

now Matches returns an array where

content[0] contains the complete matched string

content[1] contains the matched part of group 1

and here

Match = Convert<T>(content[0].Value);

you are using the wrong part of the MatchCollection Array content

Your result is in group 1, so you need to get the group 1

Match = Convert<T>(content[1].Value);

Upvotes: 1

Dave Cluderay
Dave Cluderay

Reputation: 7426

Try using the following:

PatternToken1 = "(?<=Token1 : )(\d+)"
PatternToken2 = "(?<=Token2.* : )(\d+)"
PatternToken3 = "(?<=Token3 : )(\d+)"

Upvotes: 1

Polyfun
Polyfun

Reputation: 9639

I don't know F# but in C# you need to escape back slashes by doubling them up \\, or use the @ string prefix:

PatternToken1 = "(?:Token1 : )(\\d+)";
PatternToken2 = @"(?:Token2.* : )(\d+)";
PatternToken3 = @"(?:Token3 : )(\d+)";

Upvotes: 0

Related Questions