lovesh
lovesh

Reputation: 5401

a better than naive implementation of strcspn()

I have to figure out if my subject string has any of the bad characters(some characters i absolutely hate). so if i have a string called str(char *str) and if any of the characters in the string bad(char *bad) are found the string str is rejected. Now i could use strcspn(str,bad) to check this. But can somebody suggest what the implementation of strcspn could be? A naive implementation would be to check each character of str against each character of bad and reject str if a match is found.

for(i=0;str[i]!='\0';i++)
  for(j=0;bad[j]!='\0';j++)
    if(bad[j]==str[i])
       return -1;   //reject string
return 1;    //accept string

or something like

for(i=0;str[i]!='\0';i++)
  if(strchr(bad,str[i]))   //will return non-NULL if str[i] is found in bad
    return -1;   //reject string
return 1;    //accept string

Upvotes: 0

Views: 2952

Answers (2)

If you care about worst-case performance and/or thread safety, here's a variant that keeps the table on the stack.

#include <limits.h>
#include <stddef.h>

int contains_bad(const char *str, const char *bad) {
  size_t hint[UCHAR_MAX];
  size_t len_bad;
  for (len_bad = 0; bad[len_bad]; len_bad++) {
    hint[(unsigned char)bad[len_bad] - 1] = len_bad;
  }
  for (; *str; str++) {
    size_t i = hint[(unsigned char)*str - 1];
    if (i < len_bad && *str == bad[i]) return 1;
  }
  return 0;
}

(For the language lawyers: technically the uninitialized array hint can have trap representations lurking. If you know what this means and actually care about the behavior of C programs on long-dead platforms, one solution, if bad is a set and thus has no more than UCHAR_MAX characters, is to change size_t hint[UCHAR_MAX] to unsigned char hint[UCHAR_MAX], since unsigned char is guaranteed not to have trap representations.)

Upvotes: 1

Tam&#225;s
Tam&#225;s

Reputation: 48061

If str is very long (or you are going to check many strings against the same set of bad characters), you may gain some performance by creating a lookup table of size 256 where element i is 1 if the character with ASCII code i is bad and zero otherwise:

int contains_bad(const char* str, const char* bad) {
    unsigned short int table[256];
    char* ch;

    /* Prepare the lookup table */
    memset(table, 0, 256);
    for (ch = bad; *ch != 0; ch++)
        table[*ch] = 1;

    /* Test the string */
    for (ch = str; *ch != 0; ch++)
        if (table[*ch])
            return -1;

    return 1;
}

The above code is O(m+n) worst case, where m is the length of bad and n is the length of str; your solution is O(mn) worst case.


Update: here is the alternative version of the function which keeps the lookup table in static storage and clears it only once in every 255 invocations.

int contains_bad(const char* str, const char* bad) {
    static unsigned short int table[256];
    static unsigned short int marker = 255;
    char* ch;

    /* Prepare the lookup table */
    if (marker == 255) {
        memset(table, 0, 256);
        marker = 1;
    } else {
        marker++;
    }
    for (ch = bad; *ch != 0; ch++)
        table[*ch] = marker;

    /* Test the string */
    for (ch = str; *ch != 0; ch++)
         if (table[*ch] == marker)
             return -1;

    return 1;
}

Upvotes: 8

Related Questions