Reputation: 11216
I've been trying to understand the output of a Regex.Replace call and I am puzzled as to its output.
I have a Dictionary<string, string>. I want to search for the keys in the input string and replace them with the corresponding value if the string exists at the beginning of the string, at the end of the string, or in the middle of the string if it is surrounded by one or more spaces on each side.
My input string is as follows:
North S West N East W South E W S N West South
The regular expression in this code comes out as:
(^| +?)SOUTH($| +?)|(^| +?)NORTH($| +?)|(^| +?)EAST($| +?)|(^| +?)WEST($| +?)|(^| +?)E($| +?)|(^| +?)W($| +?)|(^| +?)N($| +?)|(^| +?)S($| +?)
My expected output is:
N SOUTH W NORTH E WEST S EAST WEST SOUTH NORTH W S
My actual output is:
N S W N E W S E WEST S NORTH WEST S
The code is below. The RegEx pattern is constructed from the keys of the dictionary. I feel I am just misunderstanding something simple about regular expressions. Why does it pick up some of the words but not all of them? For example, why does it not match the word West near the end of the string, but does match the word West near the beginning of the string? I have added code to write each of the matches and the pattern string but I am stumped.
void Main()
{
var directions = new Dictionary<string, string>
{
{"SOUTH", "S"},
{"NORTH", "N"},
{"EAST", "E"},
{"WEST", "W"},
{"E", "EAST"},
{"W", "WEST"},
{"N", "NORTH"},
{"S", "SOUTH"},
};
string input = @"North S West N East W South E W S N West South";
Console.WriteLine(doReplace(input, directions));
}
private string doReplace(string input, Dictionary<string, string> lookup)
{
string output = null;
//Construct the regular expression pattern
string searchPattern = string.Join(@"|", lookup.Select(s => @"(^| +?)" + s.Key + @"($| +?)").ToArray());
Console.WriteLine(searchPattern);
//Perform the replace
output = Regex.Replace(input.ToUpper(), searchPattern, new MatchEvaluator(m =>
{
//Write out each match found
Console.WriteLine("[{0}]", m.Value);
string tmp = m.Value.Trim();
string result = tmp;
lookup.TryGetValue(tmp, out result);
//This return statement is for the lambda not the method.
return m.Value.Replace(tmp, result);
}), RegexOptions.ExplicitCapture|RegexOptions.Singleline);
return output;
}
Upvotes: 2
Views: 133
Reputation: 336128
Your problem is that the elements of your regex (unless the matches are at the start/end of the string) require at least one space before and after the match:
(^| +?)SOUTH($| +?)
matches a space, then SOUTH
, then another space. Now if the next potential match starts right after that, there would have to be a second space character to start the next match. But you only have single spaces between words, so at most every other word can match.
You can visualize this here, for example.
If your goal is to only match entire words instead of substrings, use \b
word boundary anchors. \bSOUTH\b
will match SOUTH
but not SOUTHERN
.
Upvotes: 3