bayo
bayo

Reputation: 210

String.Replace more occurrences of subsequent characters

Recently I used the String.Replace method to ensure that user input, which was later included into a HTML comment, is properly sanitized. I needed that input for later use so HttpUtility.HtmlEncode was not a choice.

What my code did was calling String.Replace("--", "- -") on the input. However, I realized that the Replace function did not behave as I expected. For example:

var userData = "----";
return userData.Replace("--", "- -"); // returns "- -- -", I expected "- - - -"

or:

var userData = "---";
return userData.Replace("--", "- -"); // returns "- --", I expected "- - -"

In the second example you can see, that this sanitization is useless and a malicious user can actually still end the comment.

Now my questions:

  1. Is this considered intended behavior for String.Replace?
  2. Can I easily achieve the output I intended to achieve?

Note: I know there are other ways to sanitize the output (eg. replacing the hyphens with underscores), but I am interested it this particular way (ie. spaces between -- subsequent dashes).

Upvotes: 0

Views: 182

Answers (5)

mehmet mecek
mehmet mecek

Reputation: 2685

  1. String.Replace does what you want once. (returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string.) reference

  2. I would do it like,

    public static class StringExtensions
    {
        public static string ReplaceAllOccurrences(
            this string str,
            string oldValue,
            string newValue)
        {
            var result = str;
            while (result.Contains(oldValue))
            {
                result = result.Replace(oldValue, newValue);
            }
    
            return result;
        }
    }
    
    
    [TestClass]
    public class ReplaceAllOccurencesTest
    {
        [TestMethod]
        public void Test()
        {
            var userData = "----";
            var replaced = userData.ReplaceAllOccurrences("--", "- -"); // returns "- -- -", I expected "- - - -"
            Assert.AreEqual(replaced, "- - - -");
    
            userData = "---";
            replaced = userData.ReplaceAllOccurrences("--", "- -"); // returns "- --", I expected "- - -"
            Assert.AreEqual(replaced, "- - -");
        }
    }
    

Upvotes: 0

denisv
denisv

Reputation: 319

Maybe this can be useful:

var userData = "----";
userData = Regex.Replace(userData, @"-{1}", " -").TrimStart();

Upvotes: 0

rory.ap
rory.ap

Reputation: 35328

That is the intended behavior because your call to Replace is only making one pass at the string. Thus, every instance of "-" in your string is being replaced by "- -", each of which is abutting one another, e.g. "- -" next to "- -" next to "- -" and so on, which looks like this: "- -|- -|- -" (<-- vertical lines added for clarity).

Just run the replace a second time to clean up the abutting "-" characters that result from the first replace:

var result = userData.Replace("--", "- -").Replace("--", "- -"); 

I would also like to point out that, while for small examples this type of direct string manipulation is fine, you would want to consider using a System.Text.StringBuilder if you are going to broaden this to larger or more iterative string manipulation. Each time you modify a string (i.e. by concatenating, appending, or calling Replace), a new string is created in memory because strings are immutable. A StringBuilder, on the other hand, gets around this problem by working with a mutable collection of characters and only produces a string at the end when you call ToString.

Here is how you could do the same thing with a StringBuilder

var sb = new System.Text.StringBuilder(userData);
var result = sb.Replace("--", "- -").Replace("--", "- -").ToString(); 

Upvotes: 4

silverfighter
silverfighter

Reputation: 6882

Have you considered to work with RegEx? There is a RegEx.Replace() method and you can handle different variations and occurrences with appropriate RegEx pattern

https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace(v=vs.110).aspx

RegEx.Replace(stringToReplaceAndTest,"/-/g"," -");

This RegEx will look for dashes globally and replaces them with and blank+dash... But like I said you just have to find the right pattern... HTH

Upvotes: 1

Konstantin Zadiran
Konstantin Zadiran

Reputation: 1541

string.Replace does one pass through string. to achieve your expectations, do

while(userData.Contains("--"))
{
     userData = userData.Replace("--", "- -");
}

Upvotes: 0

Related Questions