Reputation: 41

Comparing alphabets but not strcmp

I want to compare two strings with function int compare(char* A, char*B), returning 1 when A is larger, 0 when B is larger.

Assume the maximum character is 100, I want the order to be ‘A’< ‘a’ < ‘B’ < ‘b’ < … < ‘Z’ < ‘z’.

The reason I dont use strcmp is that strcmp will instead give ‘A’ < ‘B’ < … ‘a’ < ‘b’ < … < ‘z’. Is there a way to do it with the function?

function int compare(char* A, char* B){
//compare
}

(this is a C program)

for strcmp, it gives ‘C’ < ‘a’ as it follows the ASCII code, but I want ‘A’< ‘a’ < ‘B’ < ‘b’ < … < ‘Z’ < ‘z’

Upvotes: 4

Answers (3)

anatolyg

Reputation: 28290

Like in the other answer, start with a working manual implementation of strcmp:

int my_str_compare(const char *s1, const char *s2)
{
    for (;;)
    {
        // read a byte from each of the strings
        int c1 = *s1;
        int c2 = *s2;
        if (c1 == 0 && c2 == 0)
        {
            // reached end of strings? - they are equal
            return 0;
        }
        else if (c1 == c2)
        {
            // same bytes? - continue
            ++s1;
            ++s2;
        }
        else
        {
            // different? - return result
            return c1 - c2;
        }
    }
}

Here the return value represents the "difference" between the strings. It is positive when s1 > s2, zero when s1 == s2 and negative when s1 < s2. When the strings are unequal, the "difference" between the strings can be defined as the difference between their first unequal bytes.

If you want to redefine the order of letters in the alphabet, you can implement this by "transforming" the bytes before comparing them.

        // read a byte from each of the strings and transform it
        int c1 = my_transform(*s1);
        int c2 = my_transform(*s2);
        ... //the rest of the code is the same

For any order of letters, it is possible to define a suitable my_transform function. In a general case, as a LUT (lookup table) - see the other answer. But in this case, it's possible to express it with bit fiddling. Bit 5 of ASCII code contains the case of the letter, while bits 0...4 contain its position in the alphabet. If you switch these bit-fields, you will turn the case into the "least significant" field, and the position in the alphabet into the "most significant" field. Here is how you could do that:

int my_transform(int x)
{
    int _case = x >> 5;
    int position = x & 0x1f;
    return (position << 1) | _case;
}

This transform function will only work with alphabetical letters and the end-of-string byte (0). If your strings can contain any other byte, you should implement your transform function in a more robust (and less hacky/clever way) - use a LUT.

Upvotes: 1

pmg

Reputation: 108988

The standard way to do a case insensitive compare is to first convert to all lowercase (or uppercase) then do the standard case sensitive comparison.

Problem with your description is that this will make 'A' == 'a', 'B' == 'b', etc.

I propose you use a string literal with the valid characters in order, then strchr() into that for each pair of input chracters, something like:

int compare(char* A, char* B){
    static const char order[] = "AaBbCcDdEeF..XxYyZx";
    while (*A || *B) {
        if (!*A) return 0;
        if (!*B) return 1;
        if (*A == *B) { // advance A and B, repeat loop 
            A++;
            B++;
            continue;
        }
        char *pA = strchr(order, *A);
        char *pB = strchr(order, *B);
        if (pA && pB) {
             if (pA > pB) return 1;
             return 0;
        } else {
            // TODO UNSPECIFIED ERROR: invalid character
            return 2024;
        }
    }
    // TODO UNSPECIFIED ERROR: strings are equal
    return 2024;
}

Upvotes: 1

Ted Lyngmo

Reputation: 117822

If we start by making an implementation for strcmp, it could look like this:

// returns a negative value if A < B
// returns a positive value if A > B
// returns 0 if they are equal
int compare(const char *A, const char *B) {
    for (; *A && *B; ++A, ++B) {
        if (*A != *B) return *A - *B;  // case sensitive
    }
    // if they are of equal length and reach here both *A and *B
    // will be `\0`, otherwise the longest string will be considered
    // greater than the shortest:
    return *A - *B;
}

You can then add a case insensitive check before the case sensitive check:

#include <ctype.h>

int compare(const char *A, const char *B) {
    for (; *A && *B; ++A, ++B) {
        if (isalpha(*A) && isalpha(*B)) {
            int ua = toupper((unsigned char)*A);
            int ub = toupper((unsigned char)*B);

            if (ua != ub) return ua - ub;  // case insensitive
        }
        if (*A != *B) return *A - *B;  // case sensitive
    }
    return *A - *B;
}

Note: This returns 0 if they are equal.

If total ordering is a requirement, you remove the isalpha checks and then have to settle for what ordering you want by using either toupper or tolower which will give you different results for characters that happens to be numerically between the uppercase and lowercase letters.

int compare(const char *A, const char *B) {
    for (; *A && *B; ++A, ++B) {
        int ua = toupper((unsigned char)*A); // or tolower
        int ub = toupper((unsigned char)*B); // or towloer
        if (ua != ub) return ua - ub;  // case insensitive
        if (*A != *B) return *A - *B;  // case sensitive
    }
    return *A - *B;
}

Upvotes: 5

Comparing alphabets but not strcmp

Answers (3)

Related Questions