Reputation: 41
I want to compare two strings with function int compare(char* A, char*B)
, returning 1
when A
is larger, 0
when B
is larger.
Assume the maximum character is 100, I want the order to be ‘A’< ‘a’ < ‘B’ < ‘b’ < … < ‘Z’ < ‘z’.
The reason I dont use strcmp
is that strcmp will instead give ‘A’ < ‘B’ < … ‘a’ < ‘b’ < … < ‘z’. Is there a way to do it with the function?
function int compare(char* A, char* B){
//compare
}
(this is a C program)
for strcmp
, it gives ‘C’ < ‘a’ as it follows the ASCII code, but I want ‘A’< ‘a’ < ‘B’ < ‘b’ < … < ‘Z’ < ‘z’
Upvotes: 4
Views: 136
Reputation: 28290
Like in the other answer, start with a working manual implementation of strcmp
:
int my_str_compare(const char *s1, const char *s2)
{
for (;;)
{
// read a byte from each of the strings
int c1 = *s1;
int c2 = *s2;
if (c1 == 0 && c2 == 0)
{
// reached end of strings? - they are equal
return 0;
}
else if (c1 == c2)
{
// same bytes? - continue
++s1;
++s2;
}
else
{
// different? - return result
return c1 - c2;
}
}
}
Here the return value represents the "difference" between the strings. It is positive when s1 > s2
, zero when s1 == s2
and negative when s1 < s2
. When the strings are unequal, the "difference" between the strings can be defined as the difference between their first unequal bytes.
If you want to redefine the order of letters in the alphabet, you can implement this by "transforming" the bytes before comparing them.
// read a byte from each of the strings and transform it
int c1 = my_transform(*s1);
int c2 = my_transform(*s2);
... //the rest of the code is the same
For any order of letters, it is possible to define a suitable my_transform
function. In a general case, as a LUT (lookup table) - see the other answer. But in this case, it's possible to express it with bit fiddling. Bit 5 of ASCII code contains the case of the letter, while bits 0...4 contain its position in the alphabet. If you switch these bit-fields, you will turn the case into the "least significant" field, and the position in the alphabet into the "most significant" field. Here is how you could do that:
int my_transform(int x)
{
int _case = x >> 5;
int position = x & 0x1f;
return (position << 1) | _case;
}
This transform function will only work with alphabetical letters and the end-of-string byte (0). If your strings can contain any other byte, you should implement your transform function in a more robust (and less hacky/clever way) - use a LUT.
Upvotes: 1
Reputation: 108988
The standard way to do a case insensitive compare is to first convert to all lowercase (or uppercase) then do the standard case sensitive comparison.
Problem with your description is that this will make 'A' == 'a'
, 'B' == 'b'
, etc.
I propose you use a string literal with the valid characters in order, then strchr()
into that for each pair of input chracters, something like:
int compare(char* A, char* B){
static const char order[] = "AaBbCcDdEeF..XxYyZx";
while (*A || *B) {
if (!*A) return 0;
if (!*B) return 1;
if (*A == *B) { // advance A and B, repeat loop
A++;
B++;
continue;
}
char *pA = strchr(order, *A);
char *pB = strchr(order, *B);
if (pA && pB) {
if (pA > pB) return 1;
return 0;
} else {
// TODO UNSPECIFIED ERROR: invalid character
return 2024;
}
}
// TODO UNSPECIFIED ERROR: strings are equal
return 2024;
}
Upvotes: 1
Reputation: 117822
If we start by making an implementation for strcmp
, it could look like this:
// returns a negative value if A < B
// returns a positive value if A > B
// returns 0 if they are equal
int compare(const char *A, const char *B) {
for (; *A && *B; ++A, ++B) {
if (*A != *B) return *A - *B; // case sensitive
}
// if they are of equal length and reach here both *A and *B
// will be `\0`, otherwise the longest string will be considered
// greater than the shortest:
return *A - *B;
}
You can then add a case insensitive check before the case sensitive check:
#include <ctype.h>
int compare(const char *A, const char *B) {
for (; *A && *B; ++A, ++B) {
if (isalpha(*A) && isalpha(*B)) {
int ua = toupper((unsigned char)*A);
int ub = toupper((unsigned char)*B);
if (ua != ub) return ua - ub; // case insensitive
}
if (*A != *B) return *A - *B; // case sensitive
}
return *A - *B;
}
Note: This returns 0
if they are equal.
If total ordering is a requirement, you remove the isalpha
checks and then have to settle for what ordering you want by using either toupper
or tolower
which will give you different results for characters that happens to be numerically between the uppercase and lowercase letters.
int compare(const char *A, const char *B) {
for (; *A && *B; ++A, ++B) {
int ua = toupper((unsigned char)*A); // or tolower
int ub = toupper((unsigned char)*B); // or towloer
if (ua != ub) return ua - ub; // case insensitive
if (*A != *B) return *A - *B; // case sensitive
}
return *A - *B;
}
Upvotes: 5