nosirrahcd
nosirrahcd

Reputation: 1467

String Replace: Ignore Whitespace

The issue that I am having is the following.

The root cause of my issue is "XML" parsing (XML is in quotes, because in this case, it is not directly XML) and whitespace.

I need to be able to convert this:

 "This is a <tag>string</tag>"

into

 "This is a {0}"

It must be able to handle nested tags, and that sort of thing. My plan was to use the following to get my replacement text.

 var v = XDocument.Parse(string.Format("<root>{0}</root>", myString),LoadOptions.PreserveWhitespace);
 var ns = v.DescendantNodes();
 var n = "" + ns.OfType<XElement>().First(node => node.Name != "root");

That code returns the first pair of matching tags. It can handle nesting, etc. The only real issue is that even with the "PreserveWhitespace" option, carriage returns are getting eliminated. "\r\n" is converted to just "\n". This prevents a match, so:

 myString = myString.Replace(n,"{0}");

does not work as expected. So I am trying to come up with a way to get the replacement to work properly, ignoring whitespace, but I don't know how to begin... Thoughts?

Upvotes: 1

Views: 1924

Answers (4)

Shimmy Weitzhandler
Shimmy Weitzhandler

Reputation: 104741

This method turns the match into a regex pattern and replaces it in the source string:

public static string ReplaceMatch(string source, string match, string replacement)
{
    if (string.IsNullOrWhiteSpace(source))
    {
        return string.Empty;
    }

    if (string.IsNullOrWhiteSpace(match))
    {
        return source;
    }

    // attempt regular replace
    var regularReplace = source.Replace(match, string.Empty, StringComparison.Ordinal);

    if (source.Length != regularReplace.Length)
    {
        return regularReplace;
    }

    // Some ultimately uncommon character used as a temporary whitespace placeholder
    const string whitespaceSub = "\uFFFC\uFFF8\uFFF9";

    var pattern = WhitespaceRegex().Replace(match, replacement: whitespaceSub);
    pattern = Regex.Escape(pattern);
    pattern = pattern.Replace(whitespaceSub, WhitespaceRegexPattern, StringComparison.Ordinal);

    const string optionalWhitespacePattern = @"\s*";
    pattern = $"{optionalWhitespacePattern}{pattern}{optionalWhitespacePattern}";

    return Regex.Replace(
        source,
        pattern,
        replacement: string.Empty,
        RegexOptions.ExplicitCapture | RegexOptions.CultureInvariant,
        matchTimeout: TimeSpan.FromMilliseconds(300));
}

private const string WhitespaceRegexPattern = @"\s+";

[GeneratedRegex(WhitespaceRegexPattern, RegexOptions.ExplicitCapture | RegexOptions.CultureInvariant, matchTimeoutMilliseconds: 300)]
private static partial Regex WhitespaceRegex();

Upvotes: 0

Ωmega
Ωmega

Reputation: 43673

Input:

string s = "This <tag id=\"1\">string <inner><tag></tag></inner></tag> is <p>inside <b>of</b> another</p> string";

C# code:

Match m;
do
{
  m = Regex.Match(s, @"\A([\s\S]*)(<(\S+)[^[<>]*>[^<>]*</\3>)([\s\S]*)\Z");
  if (m.Success) {
    s = m.Groups[1].Value + "{0}" + m.Groups[4].Value;
    System.Console.WriteLine("Match: " + m.Groups[2].Value);
  }
} while (m.Success);
System.Console.WriteLine("Result: " + s);

Output:

Match: <b>of</b>
Match: <p>inside {0} another</p>
Match: <tag></tag>
Match: <inner>{0}</inner>
Match: <tag id="1">string {0}</tag>
Result: This {0} is {0} string

Test this code here.

Upvotes: 1

Tony Hopkinson
Tony Hopkinson

Reputation: 20320

Try a CDATA section ?

v = XDocument.Parse(string.Format("<root><![CDATA[{0}]]></root>", myString));

Not got anything handy but I suspect you might have to mess about with the selector after it, and get it's child (text node)

Upvotes: 0

Nikola Davidovic
Nikola Davidovic

Reputation: 8656

Although not the best solution (if you have just '\n' in your myString) but it's worth a try:

myString =  myString.Replace(n.Replace("\n", "\r\n"), "{0}");

Upvotes: 0

Related Questions