Joshua Briefman
Joshua Briefman

Reputation: 4031

What is the fastest way to compare two strings in C?

For clarity I'm only talking about null terminated strings.

I'm familiar with the standard way of doing string comparisons in C with the usage of strcmp. But I feel like it's slow and inefficient.

I'm not necessarily looking for the easiest method but the most efficient.

Can the current comparison method (strcmp) be optimized further while the underlying code remains cross platform?

If strcmp can't be optimized further, what is the fastest way which I could perform the string comparison without strcmp?

Current use case:

Reference to current strcmp() implementation:

Edit: Clarified the solution does not need to be a modification of strcmp.

Edit 2: Added specific examples for this use case.

Upvotes: 0

Views: 2873

Answers (1)

chqrlie
chqrlie

Reputation: 144740

I'm afraid your reference imlementation for strcmp() is both inaccurate and irrelevant:

  • it is inaccurate because it compares characters using the char type instead of the unsigned char type as specified in the C11 Standard:

    7.24.4 Comparison functions

    The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared.

  • It is irrelevant because the actual implementation used by modern compilers is much more sophisticated, expanded inline using hand-coded assembly language.

Any generic implementation is likely to be less optimal, especially if coded to remain portable across platforms.

Here are a few directions to explore if your program's bottleneck is comparing strings.

  • Analyze your algorithms, try and find ways to reduce the number of comparisons: for example if you search for a string in an array, sorting that array and using a binary search with drastically reduce the number of comparisons.
  • If your strings are tokens used in many different places, allocate unique copies of these tokens and use those as scalar values. The strings will be equal if and only if the pointers are equal. I use this trick in compilers and interpreters all the time with a hash table.
  • If your strings have the same known length, you can use memcmp() instead of strcmp(). memcmp() is simpler than strcmp() and can be implemented even more efficiently in places where the strings are known to be properly aligned.

EDIT: with the extra information provided, you could use a structure like this for your strings:

typedef struct string_t {
    size_t len;
    size_t hash;  // optional
    char str[];   // flexible array, use [1] for pre-c99 compilers
} string_t;

You allocate this structure this way:

string_t *create_str(const char *s) {
    size_t len = strlen(s);
    string_t *str = malloc(sizeof(*str) + len + 1;
    str->len = len;
    str->hash = hash_str(s, len);
    memcpy(str->str, s, len + 1);
    return str;
}

If you can use these str things for all your strings, you can greatly improve the efficiency of the matching by first comparing the lengths or the hashes. You can still pass the str member to your library function, it is properly null terminated.

Upvotes: 4

Related Questions