Leon
Leon

Reputation: 3401

String.Replace doesn't replace all matches

Why does line2 replaces only alternating half of occurrences?

    Dim line1 As String = "AAA|BBB|CCC|CCC|CCC|CCC|EEE|FFF"
    Dim line2 As String = "AAA|BBB|CCC|CCC|CCC|CCC|EEE|FFF"
    Dim line3 As String = "AAA|BBB|CCC|CCC|CCC|CCC|EEE|FFF"

    line1 = line1.Replace("CCC", "")
    line2 = line2.Replace("|CCC|", "||")
    line3 = line3.Replace("CCC|", "|")

Result:

line1 = "AAA|BBB|||||EEE|FFF" -- OK, but fails when element is "..|ZZZCCCZZZ|.."
line2 = "AAA|BBB||CCC||CCC|EEE|FFF" -- Not OK
line3 = "AAA|BBB|||||EEE|FFF" -- OK, but fails similar to Line1 edge-case for "..|ZZZCCC|.."

I have tried using RegEx, but get similar results.

Is there really no better way than this, below?

Do While line1.Contains("|CCC|")
    line1 = line1.Replace("|CCC|", "||")
Loop

Upvotes: 4

Views: 2849

Answers (4)

user166390
user166390

Reputation:

I might use a regular expression replace with a look-around for this case.

Consider this example:

Regex.Replace("FCCCF|CCC|CCC|", "((?<=[|])CCC(?=[|]))", "")
// ->
"FCCCF|||"

This will always match the correct number of times and is not prone to any infinite recursion issues. It requires modification to an appropriate regular expression and altering the replacement data.

However, note per Chris's comment:

Regex.Replace("FCCCF|CCC|CCC||CCC|", "((?<=[|])CCC(?=[|]))", "")
// -> only 5 pipes: verify this is correct per the intended semantics
"FCCCF|||||"

Upvotes: 0

Leon
Leon

Reputation: 3401

For anyone in the future, I've added an extension method to overcome this limitation in the framework:

<System.Runtime.CompilerServices.Extension()>
Public Function ReplaceAll(ByVal original As String, ByVal oldValue As String, ByVal newValue As String) As String

    If newValue.Contains(oldValue) Then
        Throw New ArgumentException("New value can't be a subset of OldValue as infinite replacements can occur.", newValue)
    End If

    Dim maxIterations As Integer = original.Length \ oldValue.Length

    While maxIterations > 0 AndAlso original.Contains(oldValue)
        original = original.Replace(oldValue, newValue)
        maxIterations -= 1
    End While

    Return original

End Function

Upvotes: 1

juharr
juharr

Reputation: 32296

Instead of using regular expressions or string.Replace you could parse the values, filter the ones you don't want and join them back together.

line1 = string.Join("|", line1.Split("|").Select(s => s == "CCC" ? "" : s).ToArray());

Sorry I don't know the VB equivalent.

Upvotes: 3

Chris Sinclair
Chris Sinclair

Reputation: 23208

Once it finds the first token, it starts looking for the next one after that token. So it finds |CCC|, replaces it, then continues on and the first thing it sees is CCC| which doesn't match. It doesn't pre-scan the string looking for tokens to replace.

Consider it like this:

Given AAA|BBB|CCC|CCC|CCC|CCC|EEE|FFF

It runs to AAA|BBB|CCC| HOLD IT |CCC| was found, let's start building our string:

AAA|BBB + || (our replacement)

Now let's move on, we now have CCC|CCC|CCC|EEE|FFF left to work with.

It runs to CCC|CCC| HOLD IT |CCC| was found, let's continue adding to our string:

AAA|BBB||CCC + || (our replacement)

Now let's move on, we now have CCC|CCC|EEE|FFF and so on and so on.

EDIT: Considering the entry on MSDN describing the return value:

A string that is equivalent to the current string except that all instances of oldValue are replaced with newValue.

One could read that as what you expect that it pre-scans the string and finds all matches. I don't see anything in the MSDN doc that describes this corner case. Perhaps this is something that should be added to the MSDN doc.

Upvotes: 9

Related Questions