Reputation: 322
I am writing a program that reads a file line by line to separate word and translation. The code below works. However I am unable to understand how /* separate word and translation */
part of the load_dictionary
function actually works. .
Things unclear:
p line
p word
after word = line + strspn(line, DELIMS)
. Isn't strspn
supposed to read till DELIMS
\t and print - ants\t.FILE: dict.txt
WORD TRANSLATION
ants anttt
anti eti
ante soggy
anda eggs
Function: main
/* maximum number of characters for word to search */
#define WORD_MAX 256
/* maximum number of characters in line */
#ifndef LINE_MAX
#define LINE_MAX 2048
#endif
int main(int argc, char * argv[]) {
char word[WORD_MAX], * translation;
int len;
if (argc <= 1)
return 0; /* no dictionary specified */
/* load dictionary */
load_dictionary(argv[1]);
return 0;
}
Function: load_dictionary:- read dictionary file
/* delimiter for dictionary */
#define DELIMS "\t"
unsigned void load_dictionary(const char * filename) {
FILE * pfile;
char line[LINE_MAX], * word, * translation;
/* ensure file can be opened */
if ( !(pfile = fopen(filename,"r")) )
return icount;
/* read lines */
while ( (fgets(line, LINE_MAX, pfile)) ) {
/* strip trailing newline */
int len = strlen(line);
if (len > 0 && line[len-1] == '\n') {
line[len-1] = '\0';
--len;
}
/* separate word and translation */
word = line + strspn(line, DELIMS);
if ( !word[0] )
continue; /* no word in line */
translation = word + strcspn(word, DELIMS);
*translation++ = '\0';
translation += strspn(translation, DELIMS);
}
}
Upvotes: 0
Views: 347
Reputation: 44274
strspn
will give the number of initial chars that are present in DELIM
strcspn
will give the number of initial chars that are not present in DELIM
(see http://man7.org/linux/man-pages/man3/strspn.3.html)
So the idea of the code is to use simple pointer arithmetic to make the word
and translation
pointers to point at first word in the input and second word in the input. Further, the code adds a NUL termination after the first word so that it looks like two strings.
Example:
line: \t\t\t\tC++\0\t\t\tA programming language
^ ^ ^
| | |
| | translation points here
| |
| NUL added here
|
word points here
So printing word
and translation
will give:
C++
A programming language
The code with additional comments:
word = line + strspn(line, DELIMS); // Skip tabs, i.e.
// make word point to the
// first character which is
// not a tab (aka \t)
if ( !word[0] )
continue; /* no word in line */
translation = word + strcspn(word, DELIMS); // Make translation point to the
// first character after word
// which is a tab (aka \t), i.e. it
// points to the character just after
// the first word in line
*translation++ = '\0'; // Add the NUL termination and
// increment translation
translation += strspn(translation, DELIMS); // Skip tabs, i.e.
// make translation point to the
// second word in line which is
Upvotes: 2
Reputation: 1313
I think you may need to post more code to make clear what is happening, but from what you have posted I suggest that you...
\t
is expected between words - from the file there is a \t
between the word and translation.
So strcspn
returns the number of characters to the next \t
character and then the pointers are moved on by that number of characters - and it looks like the \t
character between word and translation are replace by a \0
character. The file is read in line by line to the array char line[...
. So pointer line
points to the beginning of the array line[...
.
Upvotes: 0