Reputation: 24233
I've heard this so many times, that I have taken it for granted. But thinking back on it, can someone help me realize why string manipulation, say comparison etc, is more expensive than say an integer, or some other primitive?
Upvotes: 1
Views: 2880
Reputation: 934
There are several things to consider when looking at the "cost" of manipulating strings. There is the cost in terms of memory usage, there is the cost in terms of CPU cycles used, and there is a cost associated with the complexity of the code involved.
Integer manipulation (Add, Subtract, Multipy, Divide, Compare) is most often done by the CPU at the hardware level, in few (or even 1) instruction. When the manipulation is done, the answer fits back in the same size chunk of memory.
Strings are stored in blocks of memory, which have to be manipulated a byte or word at a time. Comparing two 100 character long strings may require 100 separate comparison operations.
Any manipulation that makes a string longer will require, either moving the string to a bigger block of memory, or moving other stuff around in memory to allow growing the existing block.
Any manipulation that leaves the string the same, or smaller, could be done in place, if the language allows for it. If not, then again, a new block of memory has to be allocated and contents moved.
Upvotes: 0
Reputation: 942109
The int data type in C# was carefully selected to be a good match with processor design. Which can store an int in a cpu register, a storage location that's an easy factor of 3 faster than memory. And a single cpu instruction to compare values of type int. The CMP instruction runs in less than a single cpu cycle, a fraction of a nano-second.
That doesn't work nearly as well for a string, it is a variable length data type and every single char in the string must be compared to test for equality. So it is automatically proportionally slower by the size of the string. Furthermore, string comparison is afflicted by culture dependent comparison rules. The kind that make "ss" and "ß" equal in German and "Aa" and "Å" equal in Danish. Nothing subtle to deal with, taken care of by highly optimized table-driven code inside the CLR. It can't beat CMP.
Upvotes: 2
Reputation: 7119
8bit example:
1 bit can be 1 or 0. With 2 bits you can represent 0, 1, 2, and 3. And so on. With a byte you have 2^8 possibilities, from 0 to 255.
In a string a single letter is stored in a byte, so "Hello world" is 11 bytes.
If I want to do 100 + 100, 100 is stored in 1 byte of memory, I need only two bytes to sum two numbers. The result will need again 1 byte.
Now let's try with strings, "100" + "100", this is 3 bytes plus 3 bytes and the result, "100100" needs 6 bytes to be stored.
This is over-simplified, but more or less it works in this way.
Upvotes: 4
Reputation: 5900
I've always thought it was because of the immutability of strings. That is, every time you make a change to the string, it requires allocating memory for a whole new string (rather than modifying the original in place).
Probably a woefully naive understanding but perhaps someone else can expound further.
Upvotes: 0