Reputation: 210
Recently I used the String.Replace method to ensure that user input, which was later included into a HTML comment, is properly sanitized. I needed that input for later use so HttpUtility.HtmlEncode was not a choice.
What my code did was calling String.Replace("--", "- -") on the input. However, I realized that the Replace function did not behave as I expected. For example:
var userData = "----";
return userData.Replace("--", "- -"); // returns "- -- -", I expected "- - - -"
or:
var userData = "---";
return userData.Replace("--", "- -"); // returns "- --", I expected "- - -"
In the second example you can see, that this sanitization is useless and a malicious user can actually still end the comment.
Now my questions:
Note: I know there are other ways to sanitize the output (eg. replacing the hyphens with underscores), but I am interested it this particular way (ie. spaces between -- subsequent dashes).
Upvotes: 0
Views: 182
Reputation: 2685
String.Replace does what you want once. (returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string.) reference
I would do it like,
public static class StringExtensions
{
public static string ReplaceAllOccurrences(
this string str,
string oldValue,
string newValue)
{
var result = str;
while (result.Contains(oldValue))
{
result = result.Replace(oldValue, newValue);
}
return result;
}
}
[TestClass]
public class ReplaceAllOccurencesTest
{
[TestMethod]
public void Test()
{
var userData = "----";
var replaced = userData.ReplaceAllOccurrences("--", "- -"); // returns "- -- -", I expected "- - - -"
Assert.AreEqual(replaced, "- - - -");
userData = "---";
replaced = userData.ReplaceAllOccurrences("--", "- -"); // returns "- --", I expected "- - -"
Assert.AreEqual(replaced, "- - -");
}
}
Upvotes: 0
Reputation: 319
Maybe this can be useful:
var userData = "----";
userData = Regex.Replace(userData, @"-{1}", " -").TrimStart();
Upvotes: 0
Reputation: 35328
That is the intended behavior because your call to Replace
is only making one pass at the string. Thus, every instance of "-" in your string is being replaced by "- -", each of which is abutting one another, e.g. "- -" next to "- -" next to "- -" and so on, which looks like this: "- -|- -|- -" (<-- vertical lines added for clarity).
Just run the replace a second time to clean up the abutting "-" characters that result from the first replace:
var result = userData.Replace("--", "- -").Replace("--", "- -");
I would also like to point out that, while for small examples this type of direct string manipulation is fine, you would want to consider using a System.Text.StringBuilder
if you are going to broaden this to larger or more iterative string manipulation. Each time you modify a string
(i.e. by concatenating, appending, or calling Replace
), a new string is created in memory because strings are immutable. A StringBuilder
, on the other hand, gets around this problem by working with a mutable collection of characters and only produces a string
at the end when you call ToString
.
Here is how you could do the same thing with a StringBuilder
var sb = new System.Text.StringBuilder(userData);
var result = sb.Replace("--", "- -").Replace("--", "- -").ToString();
Upvotes: 4
Reputation: 6882
Have you considered to work with RegEx
? There is a RegEx.Replace()
method and you can handle different variations and occurrences with appropriate RegEx
pattern
https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace(v=vs.110).aspx
RegEx.Replace(stringToReplaceAndTest,"/-/g"," -");
This RegEx will look for dashes globally and replaces them with and blank+dash... But like I said you just have to find the right pattern... HTH
Upvotes: 1
Reputation: 1541
string.Replace
does one pass through string. to achieve your expectations, do
while(userData.Contains("--"))
{
userData = userData.Replace("--", "- -");
}
Upvotes: 0