Reputation: 5401
I have to figure out if my subject string has any of the bad characters(some characters i absolutely hate). so if i have a string called str(char *str)
and if any of the characters in the string bad(char *bad)
are found the string str
is rejected. Now i could use strcspn(str,bad)
to check this. But can somebody suggest what the implementation of strcspn
could be?
A naive implementation would be to check each character of str
against each character of bad
and reject str
if a match is found.
for(i=0;str[i]!='\0';i++)
for(j=0;bad[j]!='\0';j++)
if(bad[j]==str[i])
return -1; //reject string
return 1; //accept string
or something like
for(i=0;str[i]!='\0';i++)
if(strchr(bad,str[i])) //will return non-NULL if str[i] is found in bad
return -1; //reject string
return 1; //accept string
Upvotes: 0
Views: 2952
Reputation: 614
If you care about worst-case performance and/or thread safety, here's a variant that keeps the table on the stack.
#include <limits.h>
#include <stddef.h>
int contains_bad(const char *str, const char *bad) {
size_t hint[UCHAR_MAX];
size_t len_bad;
for (len_bad = 0; bad[len_bad]; len_bad++) {
hint[(unsigned char)bad[len_bad] - 1] = len_bad;
}
for (; *str; str++) {
size_t i = hint[(unsigned char)*str - 1];
if (i < len_bad && *str == bad[i]) return 1;
}
return 0;
}
(For the language lawyers: technically the uninitialized array hint
can have trap representations lurking. If you know what this means and actually care about the behavior of C programs on long-dead platforms, one solution, if bad
is a set and thus has no more than UCHAR_MAX
characters, is to change size_t hint[UCHAR_MAX]
to unsigned char hint[UCHAR_MAX]
, since unsigned char is guaranteed not to have trap representations.)
Upvotes: 1
Reputation: 48061
If str
is very long (or you are going to check many strings against the same set of bad characters), you may gain some performance by creating a lookup table of size 256 where element i is 1 if the character with ASCII code i is bad and zero otherwise:
int contains_bad(const char* str, const char* bad) {
unsigned short int table[256];
char* ch;
/* Prepare the lookup table */
memset(table, 0, 256);
for (ch = bad; *ch != 0; ch++)
table[*ch] = 1;
/* Test the string */
for (ch = str; *ch != 0; ch++)
if (table[*ch])
return -1;
return 1;
}
The above code is O(m+n) worst case, where m is the length of bad and n is the length of str; your solution is O(mn) worst case.
Update: here is the alternative version of the function which keeps the lookup table in static storage and clears it only once in every 255 invocations.
int contains_bad(const char* str, const char* bad) {
static unsigned short int table[256];
static unsigned short int marker = 255;
char* ch;
/* Prepare the lookup table */
if (marker == 255) {
memset(table, 0, 256);
marker = 1;
} else {
marker++;
}
for (ch = bad; *ch != 0; ch++)
table[*ch] = marker;
/* Test the string */
for (ch = str; *ch != 0; ch++)
if (table[*ch] == marker)
return -1;
return 1;
}
Upvotes: 8