GonzoKnight
GonzoKnight

Reputation: 829

Is it more or less efficient to perform a check before performing a Replace in C#?

This is an almost academic question but I'm curious as to its answer.

Suppose you have a loop that performs a routine replace on every row in a dataset. Let's say there's 10,000 such rows.

Is it more efficient to have something like this:

 Row = Row.Replace('X', 'Y');

Or to check whether the row even contains the character that is to be replaced in the first place, like this:

 if (Row.Contains('X')) Row = Row.Replace('X', 'Y');

Is there any difference in terms of efficiency? I realize that that the difference might be very minor bit I'm interested in knowing if one way is better than the other regardless of how much better it may be. Also, would your answer be different if the probability of finding the character that's to be replaced was 10% from it it being 90%?

Upvotes: 15

Views: 1890

Answers (4)

Simon Hughes
Simon Hughes

Reputation: 3574

Don't forget that strings in C# are IMMUTABLE. That means they cannot change.

For it to replace anything it has to create a new string in memory, and copy the data across, then garbage collect the old string later on.

Using Contains() first, will prevent needless creation, copying, and garbage collection of string data, and therefore perform faster.

Upvotes: -1

Neil Barnwell
Neil Barnwell

Reputation: 42125

You need to measure first on a realistic dataset, then decide which is higher performance. If your typical dataset doesn't often have anything, then having the Contains() call may be faster (because although Replace also iterates through all chars in the string, there will be an extra string object created and garbage collected due to the immutability of strings), but if "X" is often present, the check becomes a waste and actually slows things down.

Also, this typically isn't the first place to look for and worry about performance problems. Things like chatty interfaces, network I/O, web services, databases, file I/O and GUI updates are going to hurt you orders of magnitude more than stuff like this.

If you were going to do stuff like this, and if Row came back from a database (as it's name suggests) then getting the database to do the query might be another approach to save performance. E.g.

select MyTextColumn from MyTable where MyTextColumn like '%X%'

Then perform the replacement on all the results, because you know you only returned results where the replacement was needed.

This does introduce other concerns though - for example, in SQL Server, if the above example included an index on MyTextColumn, SQL Server won't be able to use that index because the like argument starts with a wildcard (it's not considered to be "sargable").

In summary, write for correctness, readability and maintenance first, then measure performance and make targeted improvements where they are found to be required.

Upvotes: 1

Mike Richards
Mike Richards

Reputation: 5667

For your check, Row.Contains('X'), is an O(n) function, which means that it iterates over the entire string one character at a time to see if that character exists.

Row.Replace('X', 'Y') works exactly the same way, it checks every single character one character at a time.

So, if you have that check in place, you iterate over the string potentially twice. If you just replace, you iterate over the string once.

Upvotes: 12

InBetween
InBetween

Reputation: 32750

The first option is faster. In order to check if a substring is present it first has to find it. As there won't be any caching mechanism why not replace it directly? Otherwise you'd be searching twice. If 'X' is present many times you would be basically doubling the effort.

Upvotes: 0

Related Questions