Reputation: 1467
The issue that I am having is the following.
The root cause of my issue is "XML" parsing (XML is in quotes, because in this case, it is not directly XML) and whitespace.
I need to be able to convert this:
"This is a <tag>string</tag>"
into
"This is a {0}"
It must be able to handle nested tags, and that sort of thing. My plan was to use the following to get my replacement text.
var v = XDocument.Parse(string.Format("<root>{0}</root>", myString),LoadOptions.PreserveWhitespace);
var ns = v.DescendantNodes();
var n = "" + ns.OfType<XElement>().First(node => node.Name != "root");
That code returns the first pair of matching tags. It can handle nesting, etc. The only real issue is that even with the "PreserveWhitespace" option, carriage returns are getting eliminated. "\r\n"
is converted to just "\n"
. This prevents a match, so:
myString = myString.Replace(n,"{0}");
does not work as expected. So I am trying to come up with a way to get the replacement to work properly, ignoring whitespace, but I don't know how to begin... Thoughts?
Upvotes: 1
Views: 1924
Reputation: 104741
This method turns the match into a regex pattern and replaces it in the source string:
public static string ReplaceMatch(string source, string match, string replacement)
{
if (string.IsNullOrWhiteSpace(source))
{
return string.Empty;
}
if (string.IsNullOrWhiteSpace(match))
{
return source;
}
// attempt regular replace
var regularReplace = source.Replace(match, string.Empty, StringComparison.Ordinal);
if (source.Length != regularReplace.Length)
{
return regularReplace;
}
// Some ultimately uncommon character used as a temporary whitespace placeholder
const string whitespaceSub = "\uFFFC\uFFF8\uFFF9";
var pattern = WhitespaceRegex().Replace(match, replacement: whitespaceSub);
pattern = Regex.Escape(pattern);
pattern = pattern.Replace(whitespaceSub, WhitespaceRegexPattern, StringComparison.Ordinal);
const string optionalWhitespacePattern = @"\s*";
pattern = $"{optionalWhitespacePattern}{pattern}{optionalWhitespacePattern}";
return Regex.Replace(
source,
pattern,
replacement: string.Empty,
RegexOptions.ExplicitCapture | RegexOptions.CultureInvariant,
matchTimeout: TimeSpan.FromMilliseconds(300));
}
private const string WhitespaceRegexPattern = @"\s+";
[GeneratedRegex(WhitespaceRegexPattern, RegexOptions.ExplicitCapture | RegexOptions.CultureInvariant, matchTimeoutMilliseconds: 300)]
private static partial Regex WhitespaceRegex();
Upvotes: 0
Reputation: 43673
string s = "This <tag id=\"1\">string <inner><tag></tag></inner></tag> is <p>inside <b>of</b> another</p> string";
Match m;
do
{
m = Regex.Match(s, @"\A([\s\S]*)(<(\S+)[^[<>]*>[^<>]*</\3>)([\s\S]*)\Z");
if (m.Success) {
s = m.Groups[1].Value + "{0}" + m.Groups[4].Value;
System.Console.WriteLine("Match: " + m.Groups[2].Value);
}
} while (m.Success);
System.Console.WriteLine("Result: " + s);
Match: <b>of</b>
Match: <p>inside {0} another</p>
Match: <tag></tag>
Match: <inner>{0}</inner>
Match: <tag id="1">string {0}</tag>
Result: This {0} is {0} string
Test this code here.
Upvotes: 1
Reputation: 20320
Try a CDATA section ?
v = XDocument.Parse(string.Format("<root><![CDATA[{0}]]></root>", myString));
Not got anything handy but I suspect you might have to mess about with the selector after it, and get it's child (text node)
Upvotes: 0
Reputation: 8656
Although not the best solution (if you have just '\n'
in your myString
) but it's worth a try:
myString = myString.Replace(n.Replace("\n", "\r\n"), "{0}");
Upvotes: 0