Keith Miller
Keith Miller

Reputation: 1347

Most Efficient Way To Count How Many Times A Character Occurs within a String

I am writing a very simple function that counts how many times a certain character occurs within a given string. I have a working function but was wondering if there was a more efficient or preferred method of doing this.

Here is the function:

size_t strchroc(const char *str, const char ch)
{ 
    int c = 0, i = 0;

    while(str[i]) if(str[i++] == ch) c++;
    return c;
}

I personally cannot think of any way to get this code more efficient. And was wondering (just for the sake of learning) if anybody knew of a way to make this function more efficient.

(efficient in the sense of speed and using minimal resources).

Upvotes: 3

Views: 4398

Answers (6)

Runium
Runium

Reputation: 1085

After a quick low quality benchmark I ended up with this for strings of arbitrary length.

On huge strings (100M+) it did not show too much of a difference, but on shorter strings (sentences, normal text files etc.) the improvement was about 25%.

unsigned int countc_r(char *buf, char c)
{
    unsigned int k = 0;

    for (;;) {
        if (!buf[0]) break;
        if ( buf[0] == c) ++k;
        if (!buf[1]) break;
        if ( buf[1] == c) ++k;
        if (!buf[2]) break;
        if ( buf[2] == c) ++k;
        if (!buf[3]) break;
        if ( buf[3] == c) ++k;
        buf += 4;
    }

    return k;
}

Upvotes: 0

Bragi Acculum
Bragi Acculum

Reputation: 11

You should use strchr() (or memchr() if you know the length) before using your function. If there is a match, you can start from the position of the first matching character and then go from there.

This should be much faster unless your strings are very short, or it matches very early.

Upvotes: 1

Santeri Paavolainen
Santeri Paavolainen

Reputation: 411

First of all, unless your function is really time-sensitive, do not try to over-optimize. Just use the one you provided, as it is easy to verify for correctness and it doesn't try to be smart for just the heck of it.

If the function needs really to be fast then there are many ways to optimize it more. Many, really many ways. Some of them either expect or assume specific memory layout of the strings you have (for example, that they are allocated on word boundaries and the allocation is also always padded up to word boundary). So you'd need to be careful, as the algorithm might work on some combination of processor, compiler and memory allocator and fail miserably on others.

Just for the heck of it, I'll list some possible ways to speed up the character counter:

  • Read the string a word (32 or 64 bit integer) at a time. Not necessarily much of a help thanks to L1 caching and speculative/out-of-order execution. This needs end-of-loop adjustment for the last word (miscounting bytes after NUL terminator). Use only with word-aligned and padded memory allocators.
  • Remove the conditional, and instead calculate counts for all characters (to an array) and return the count for the wanted character. (This will remove the conditional and if you know string length in advance it makes for excellent loop unrolling plus removes one point of conditional branching.)
  • If you know the length of the string beforehand (calculated somewhere else) you can use that to unroll the loop. Or better, write it as a for-loop and apply a suitable #pragma and compiler options to make the compiler do loop unrolling for you.
  • Write the routine in assembler. Before going this way, crank up all compiler optimizations and disassemble the routine first -- you are likely to find out that the compiler already used all potential tricks you knew and several you didn't.
  • If your string is potentially very large (megabytes) -- and here I am speculating -- using a graphics card via OpenCL/CUDA might offer some potential.

And so on.

But I really, really suggest you stick with the one you have if you have a real-world problem. If this is a toy problem and you are optimizing for the fun of it, go ahead.

Cycle-shaving is a fun way to learn CPUs and instructions sets, but for 99.999999...% of programming tasks it is not worth the effort.

Upvotes: 5

wildplasser
wildplasser

Reputation: 44250

size_t count_the_string(const char *str, const char ch){
    size_t cnt ;
    for(cnt=0; *str; ) {
        cnt += *str++ == ch;
    }
    return cnt;
}

For the equivalent do { ...} while(); variant, GCC generates code without the conditional jump (except for the loop's jump, of course) , comparable to @hakattack 's solution.

size_t count_the_string2(const char *str, const char ch){
    size_t cnt=0 ;
    do {
        cnt += *str == ch;
    } while (*str++);
    return cnt;
}

Upvotes: 0

rodrigo
rodrigo

Reputation: 98476

You can use the pointer to iterate the string, and with a little effort use the * only once per character:

size_t strchroc(const char *str, const char ch)
{ 
    size_t c = 0;
    char n;
    while ((n=*str++), ((n==ch)? ++c : 0), n)
        ;
    return c;
}

Not that the compiler couldn't optimize yours to exactly the same code, but just for fun.

Upvotes: 3

hackattack
hackattack

Reputation: 1087

you can get rid of the variable i.

size_t strchroc(const char *str, const char ch){ 
    size_t c = 0;
    while(*str != '\0') {
        if(*str == ch) c++;
        str++;
    }
    return c;
}

Upvotes: 0

Related Questions