Ulrira
Ulrira

Reputation: 219

*str and *str++

I have this code (my strlen function)

size_t slen(const char *str)
{
    size_t len = 0;
    while (*str)
    {
        len++;
        str++;
    }
    return len;
}

Doing while (*str++), as shown below, the program execution time is much larger:

while (*str++)
{
    len++;
}

I'm doing this to probe the code

int main()
{
    double i = 11002110;
    const char str[] = "long string here blablablablablablablabla"
    while (i--)
        slen(str);

    return 0;
}

In first case the execution time is around 6.7 seconds, while in the second (using *str++), the time is around 10 seconds!

Why so much difference?

Upvotes: 8

Views: 10517

Answers (4)

Nik Bougalis
Nik Bougalis

Reputation: 10613

Others have already provided some excellent commentary, including analysis for the generated assembly code. I strongly recommend that you read them carefully. As they have pointed out this sort of question can't really be answered without some quantification, so let's and play with it a bit.

First, we're going to need a program. Our plan is this: we will generate strings whose lengths are powers of two, and try all functions in turn. We run through once to prime the cache and then separately time 4096 iterations using the highest-resolution available to us. Once we are done, we will calculate some basic statistics: min, max and the simple-moving average and dump it. We can then do some rudimentary analysis.

In addition to the two algorithms you've already shown, I will show a third option which doesn't involve the use of a counter at all, relying instead on a subtraction, and I'll mix things up by throwing in std::strlen, just to see what happens. It'll be an interesting throwdown.

Through the magic of television our little program is already written, so we compile it with gcc -std=c++11 -O3 speed.c and we get cranking producing some data. I've done two separate graphs, one for strings whose size is from 32 to 8192 bytes and another for strings whose size is from 16384 all the way to 1048576 bytes long. In the following graphs, the Y axis is the time consumed in nanoseconds and the X axis shows the length of the string in bytes.

Without further ado, let's look at performance for "small" strings from 32 to 8192 bytes:

Performance Plot - Small Strings

Now this is interesting. Not only is the std::strlen function outperforming everything across the board, it's doing it with gusto too since it's performance is a lot of more stable.

Will the situation change if we look at larger strings, from 16384 all the way to 1048576 bytes long?

enter image description here

Sort of. The difference is becoming even more pronounced. As our custom-written functions huff-and-puff, std::strlen continues to perform admirably.

An interesting observation to make is that you can't necessarily translate number of C++ instructions (or even, number of assembly instructions) to performance, since functions whose bodies consist of fewer instructions sometimes take longer to execute.

An even more interesting -- and important observation is to notice just how well the str::strlen function performs.

So what does all this get us?

First conclusion: don't reinvent the wheel. Use the standard functions available to you. Not only are they already written, but they are very very heavily optimized and will almost certainly outperform anything you can write unless you're Agner Fog.

Second conclusion: unless you have hard data from a profiler that a particular section of code or function is hot-spot in your application, don't bother optimizing code. Programmers are notoriously bad at detecting hot-spots by looking at high level function.

Third conclusion: prefer algorithmic optimizations in order to improve your code's performance. Put your mind to work and let the compiler shuffle bits around.

Your original question was: "why is function slen2 slower than slen1?" I could say that it isn't easy to answer without a lot more information, and even then it might be a lot longer and more involved than you care for. Instead what I'll say is this:

Who cares why? Why are you even bothering with this? Use std::strlen - which is better than anything that you can rig up - and move on to solving more important problems - because I'm sure that this isn't the biggest problem in your application.

Upvotes: 1

Blagovest Buyukliev
Blagovest Buyukliev

Reputation: 43508

Probably because the post-increment operator (used in the condition of the while statement) involves keeping a temporary copy of the variable with its old value.

What while (*str++) really means is:

while (tmp = *str, ++str, tmp)
  ...

By contrast, when you write str++; as a single statement in the body of the while loop, it is in a void context, hence the old value isn't fetched because it's not needed.

To summarise, in the *str++ case you have an assignment, 2 increments, and a jump in each iteration of the loop. In the other case you only have 2 increments and a jump.

Upvotes: 6

Hoa Long Tam
Hoa Long Tam

Reputation: 750

This depends on your compiler, compiler flags, and your architecture. With Apple's LLVM gcc 4.2.1, I don't get a noticeable change in performance between the two versions, and there really shouldn't be. A good compiler would turn the *str version into something like

IA-32 (AT&T Syntax):

slen:
        pushl %ebp             # Save old frame pointer
        movl  %esp, %ebp       # Initialize new frame pointer
        movl  -4(%ebp), %ecx   # Load str into %ecx
        xor   %eax, %eax       # Zero out %eax to hold len
loop:
        cmpb  (%ecx), $0       # Compare *str to 0
        je    done             # If *str is NUL, finish
        incl  %eax             # len++
        incl  %ecx             # str++
        j     loop             # Goto next iteration
done:
        popl  %ebp             # Restore old frame pointer
        ret                    # Return

The *str++ version could be compiled exactly the same (since changes to str aren't visible outside slen, when the increment actually occurs isn't important), or the body of the loop could be:

loop:
        incl  %ecx             # str++
        cmpb  -1(%ecx), $0     # Compare *str to 0
        je    done             # If *str is NUL, finish
        incl  %eax             # len++
        j     loop             # Goto next iteration

Upvotes: 1

Node
Node

Reputation: 3509

Trying this out on ideone.com, I get about 0.5s execution with *str++ here. Without, it takes just over a second (here). Using *str++ was faster. Perhaps with optimisation on *str++ can be done more efficiently.

Upvotes: 2

Related Questions