Reputation: 381
I have a regex:
Regex.Match(result, @"\bTop Rate\b.*?\s*\s*([\d,\.]+)", RegexOptions.IgnoreCase);
And then parse it into int
topRate = int.Parse(topRateMatch.Groups[1].Value, System.Globalization.NumberStyles.AllowThousands);
Example)
Top Rate: 888,888
Output: 888888
I'm getting the int output just fine by using my current Regex. However, I noticed that when there are whitespace(s) in between the numbers forexample,
Top Rate: 8 88,888
I only get an 8. Is there a way to just ignore any whitespaces that may or may not exist in between the numbers/after Top Rate letter?
Exmaple)
Top Rate: 8 88,888
Expected output: 888888
Top Rate: 8 88,888
Expected output: 888888
Top Rate: 8 88,888
Expected output: 888888
Top Rate: 8 8 8,888
Expected output: 888888
Top Rate: 888, 8 88
Expected output: 888888
Upvotes: 3
Views: 581
Reputation: 627607
First of all, you cannot skip or omit whitespaces when matching and capturing the numbers, you could only do it by extracting several matches after a given string. However, there is an easy two-step approach.
You may add \s
to match any whitespace or \p{Zs}
and \t
to match any horizontal whitespace to the character class. I would recommend capturing the number with \d
first and then use an optional non-capturing group with a digit pattern at the end to make sure the number captured starts and ends with a digit:
\bTop Rate\b.*?(\d(?:[\d,.\s]*\d)?)
See the regex demo. Note that repeating \s*\s*
makes little sense, \s*
already matches zero or more whitespace chars, and even \s*
is redundant due to .*?
that matches any zero or more chars other than LF chars as few as possible. To make it match across lines, add the RegexOptions.Singleline
option.
Details:
\bTop Rate\b
- a whole word Top Rate
.*?
- any zero or more chars other than a newline char as few as possible(\d(?:[\d,.\s]*\d)?)
- Group 1:
\d
- a digit(?:[\d,.\s]*\d)?
- an optional non-capturing group matching zero or more digits, ,
, .
or whitespaces and then a digit.Next, when you get the match only keep digits.
var text = "Top Rate: 8 88,888";
var result = Regex.Match(text, @"\bTop Rate\b.*?(\d(?:[\d,.\s]*\d)?)", RegexOptions.Singleline);
if (result.Success)
{
Console.WriteLine( new string(result.Groups[1].Value.Where(c => char.IsDigit(c)).ToArray()) );
}
See the C# demo. With multiple matching:
var text = "Top Rate: 8 88,888 and Top Rate: 8 \n 88,888";
var results = Regex.Matches(text, @"\bTop Rate\b.*?(\d(?:[\d,.\s]*\d)?)", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => new string(x.Groups[1].Value.Where(c => char.IsDigit(c)).ToArray()));
foreach (var s in results)
{
Console.WriteLine( s );
}
See this C# demo.
Upvotes: 2
Reputation: 380
I verified and found with a small change in Regex statement, you can achieve your goal.
First one:
Second one:
Upvotes: 0
Reputation: 74385
Something like this?
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string[] texts = {
"This should Not match the Top Rate thing",
" Top Rate : 888,888 ",
"Top Rate : 8 8 8 , 8 8 8 ",
};
Regex rxNonDigit = new Regex(@"\D+"); // matches 1 or more characters other than decimal digits.
Regex rxTopRate = new Regex(@"
^ # match start of line, followed by
\s* # zero or more lead-in whitespace characters, followed by
Top # the literal 'Top', followed by
\s+ # 1 or more whitespace characters,followed by
Rate # the literal 'Rate', followed by
\s* # zero or more whitespace characters, followed by
: # a literal colon ':', followed by
\s* # zero or more whitespace characters followed by
(?<rate> # an named (explicit) capture group, containing
\d+ # - 1 or more decimal digits, followed by
( # - an unnamed group, containing
(\s|,)+ # - interstial whitespace or a comma, followed by
\d+ # - 1 or more decimal digits
)* # the whole of which is repeated zero or more times
) # followed by
\s* # zero or more lead-out whitespace characters, followed by
$ # end of line
", RegexOptions.IgnorePatternWhitespace|RegexOptions.ExplicitCapture );
foreach ( string text in texts )
{
Match m = rxTopRate.Match(text);
if (!m.Success)
{
Console.WriteLine("No Match: '{0}'", text);
}
else
{
string rawValue = m.Groups["rate"].Value;
string cleanedValue = rxNonDigit.Replace(rawValue, "");
Decimal value = Decimal.Parse(cleanedValue);
Console.WriteLine(@"Matched: '{0}' >>> '{1}' >>> '{2}' >>> {3}",
text,
rawValue,
cleanedValue,
value
);
}
}
}
}
Upvotes: 0