Reputation: 85
I am having an issue with regexp in C#. I've been using theses patterns in F# and it works fine, so i don't understand why it would'nt work in C#.
So let's say i've got a muline input file. I need to parse this file for specific data:
Exemple:
Lorem ipsum dolor sit amet, consectetur adipiscing elit (Token1 : 42)
Aliquam id ante ut ante tempus fringilla Token2 (ante ut ) : 45
Morbi varius adipiscing lacus, eget pellentesque tellus vulputate Token3 : 43
I basicaly need to retrieve the numbers written after Token1, Token2 , Token3 in a single match ( ie just want my number as a result). The patterns i used in F# are the following ones:
PatternToken1 = "(?:Token1 : )(\d+)"
PatternToken2 = "(?:Token2.* : )(\d+)"
PatternToken3 = "(?:Token3 : )(\d+)"
So my issue is the following one : pattern matching my input string in F# would give me the following results:
MatchedToken1 = 42
MatchedToken2 = 45
MatchedToken3 = 43
In C# i would get the following results:
MatchedToken1 = Token1 : 42
MatchedToken2 = Token2 (ante ut ) : 45
MatchedToken3 = Token3 : 43
How come this works in F# and not in C# ? What kind of pattern must i use for it to work in C#?
EDIT: Here is the code i use to match my patterns in c#:
abstract class PatternMatcherBaseEntity<T>
{
protected Regex Pattern;
protected T Match;
private static TK Convert<TK>(string input)
{
TK res=default(TK);
var converter = TypeDescriptor.GetConverter(typeof(TK));
if(converter != null)
{
try
{
res = (TK) converter.ConvertFromString(input);
}
catch (Exception)
{
res = default(TK);
}
}
return res;
}
protected bool Matcher(string s)
{
var res = false;
//var matchedData = Regex.Match(s, Patterm);
var content = Pattern.Matches(s);
if(content.Count>0)
{
//Match = Convert<T>(content.Value);
Match = Convert<T>(content[0].Value);
res = true;
}
return res;
}
public T MatchGetter(String stringToMatch)
{
T ret = default(T);
if(stringToMatch != String.Empty)
{
ret = stringToMatch.Match()
.With(Matcher, x => Match)
.Else(x => default(T))
.Do();
}
return ret;
}
}
by the way i've tested using verbatim strings and escape string. It would not compile otherwise
Upvotes: 0
Views: 182
Reputation: 2778
In C# you'll want (?:) to match but not include in the match result:
Regex.Match(str, @"(?:Token1) : (\d+)"); // result = 42
Regex.Match(str, @"(?:Token2).* : )(\d+)"); // result = 45
Regex.Match(str, @"(?:Token3).+:.+\d+"); // result = 43
EDIT - accidentally had a stray paran in there - thanks to commenter for pointing out - also, fully misunderstood the point of the question ... thought the OP wanted to get the match with the word - funny part is I even open the question with "to match but not include". Not sure what I was thinking - anyways, new code and this time copy/paste to avoid extra paran ...
string str = "Lorem ipsum dolor sit amet, consectetur adipiscing elit (Token1 : 42) Aliquam id ante ut ante tempus fringilla Token2 (ante ut ) : 45 Morbi varius adipiscing lacus, eget pellentesque tellus vulputate Token3 : 43 ";
Match m1 = Regex.Match(str, @"(?<=Token1 : +)\d+");
Match m2 = Regex.Match(str, @"(?<=Token2.* : +)\d+");
Match m3 = Regex.Match(str, @"(?<=Token3 : +)\d+");
MatchCollection mAll = Regex.Matches(str, @"(?<=Token\d[^\:]+: +)\d+");
Upvotes: 0
Reputation: 93026
(?:Token1 : )(\d+)
^ ^
With this brackets you are creating a capturing group that puts the match from within the brackets in a capturing group.
You uses it now like this
var content = Pattern.Matches(s);
now Matches
returns an array where
content[0]
contains the complete matched string
content[1]
contains the matched part of group 1
and here
Match = Convert<T>(content[0].Value);
you are using the wrong part of the MatchCollection
Array content
Your result is in group 1, so you need to get the group 1
Match = Convert<T>(content[1].Value);
Upvotes: 1
Reputation: 7426
Try using the following:
PatternToken1 = "(?<=Token1 : )(\d+)"
PatternToken2 = "(?<=Token2.* : )(\d+)"
PatternToken3 = "(?<=Token3 : )(\d+)"
Upvotes: 1
Reputation: 9639
I don't know F# but in C# you need to escape back slashes by doubling them up \\, or use the @ string prefix:
PatternToken1 = "(?:Token1 : )(\\d+)";
PatternToken2 = @"(?:Token2.* : )(\d+)";
PatternToken3 = @"(?:Token3 : )(\d+)";
Upvotes: 0