Haslo Vardos
Haslo Vardos

Reputation: 322

Understanding strspn(const char *s, const char *accept) function

I am writing a program that reads a file line by line to separate word and translation. The code below works. However I am unable to understand how /* separate word and translation */ part of the load_dictionary function actually works. I ran it through **gdb**.

Things unclear:

FILE: dict.txt

WORD    TRANSLATION

ants    anttt
anti    eti
ante    soggy
anda    eggs

Function: main

 /* maximum number of characters for word to search */
 #define WORD_MAX 256

 /* maximum number of characters in line */
 #ifndef LINE_MAX
 #define LINE_MAX 2048
 #endif

 int main(int argc, char * argv[]) {
        char word[WORD_MAX], * translation;
        int len;

        if (argc <= 1)
            return 0; /* no dictionary specified */

        /* load dictionary */
        load_dictionary(argv[1]);
        return 0;
  }

Function: load_dictionary:- read dictionary file

/* delimiter for dictionary */
#define DELIMS "\t"

unsigned void load_dictionary(const char * filename) {
        FILE * pfile;
        char line[LINE_MAX], * word, * translation;

        /* ensure file can be opened */
        if ( !(pfile = fopen(filename,"r")) )
            return icount;

        /* read lines */
        while ( (fgets(line, LINE_MAX, pfile)) ) {
            /* strip trailing newline */
            int len = strlen(line);
            if (len > 0 && line[len-1] == '\n') {
              line[len-1] = '\0';
              --len;
            }

            /* separate word and translation */
            word = line + strspn(line, DELIMS);

            if ( !word[0] )
                continue; /* no word in line */
            translation = word + strcspn(word, DELIMS);
            *translation++ = '\0';
            translation += strspn(translation, DELIMS);
        }
 }

Upvotes: 0

Views: 347

Answers (2)

4386427
4386427

Reputation: 44274

strspn will give the number of initial chars that are present in DELIM

strcspn will give the number of initial chars that are not present in DELIM

(see http://man7.org/linux/man-pages/man3/strspn.3.html)

So the idea of the code is to use simple pointer arithmetic to make the word and translation pointers to point at first word in the input and second word in the input. Further, the code adds a NUL termination after the first word so that it looks like two strings.

Example:

line: \t\t\t\tC++\0\t\t\tA programming language
              ^   ^      ^
              |   |      |
              |   |      translation points here
              |   |      
              |   NUL added here
              |  
              word points here

So printing word and translation will give:

C++
A programming language

The code with additional comments:

        word = line + strspn(line, DELIMS);  // Skip tabs, i.e. 
                                             // make word point to the
                                             // first character which is
                                             // not a tab (aka \t)

        if ( !word[0] )
            continue; /* no word in line */

        translation = word + strcspn(word, DELIMS); // Make translation point to the
                                                    // first character after word 
                                                    // which is a tab (aka \t), i.e. it 
                                                    // points to the character just after
                                                    // the first word in line



        *translation++ = '\0';               // Add the NUL termination and
                                             // increment translation

        translation += strspn(translation, DELIMS);   // Skip tabs, i.e. 
                                                      // make translation point to the
                                                      // second word in line which is

Upvotes: 2

tom
tom

Reputation: 1313

I think you may need to post more code to make clear what is happening, but from what you have posted I suggest that you...

  • have a look at https://www.geeksforgeeks.org/strcspn-in-c/ the aim is to break up the line into individual words - it seems that \t is expected between words - from the file there is a \t between the word and translation. So strcspn returns the number of characters to the next \t character and then the pointers are moved on by that number of characters - and it looks like the \t character between word and translation are replace by a \0 character.

The file is read in line by line to the array char line[.... So pointer line points to the beginning of the array line[....

Upvotes: 0

Related Questions