Reputation: 217
I'm trying to extract the words from a .txt file which contains the following sentence
Quando avevo cinqve anni, mia made mi perpeteva sempre che la felicita e la chiave della vita. Quando andai a squola mi domandrono come vuolessi essere da grande. Io scrissi: selice. Mi dissero che non avevo capito il corpito, e io dissi loro che non avevano capito la wita.
The problem is that in the array that I use to store the words, it stores also empty words ' '
which come always after one of the following ','
'.'
':'
I know that things like "empty words" or "empty chars" don't make sense but please try the code with the text that I've passed and you'll understand.
Meanwhile I'm trying to understand the use of sscanf
with this modifier sscanf(buffer, "%[^.,:]");
that should allow me to store strings ignoring the .
and ,
and :
characters however I don't know what should i
write in %[^]
to ignore the empty character ' '
which always gets saved.
The code is the following
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
static void load_array(const char* file_name){
char buffer[2048];
char a[100][100];
int buf_size = 2048;
FILE *fp;
int j = 0, c = 0;
printf("\nLoading data from file...\n");
fp = fopen(file_name,"r");
if(fp == NULL){
fprintf(stderr,"main: unable to open the file");
exit(EXIT_FAILURE);
}
fgets(buffer,buf_size,fp);
//here i store each word in an array of strings when I encounter
//an unwanted char I save the word into the next element of the
//array
for(int i = 0; i < strlen(buffer); i++) {
if((buffer[i] >= 'a' && buffer[i] <= 'z') || (buffer[i] >= 'A' && buffer[i] <= 'Z')) {
a[j][c++] = buffer[i];
} else {
j++;
c = 0;
continue;
}
}
//this print is used only to see the words in the array of strings
for(int i = 0; i < 100; i++)
printf("%s %d\n", a[i], i);
fclose(fp);
printf("\nData loaded\n");
}
//Here I pass the file_name from command line
int main(int argc, char const *argv[]) {
if(argc < 2) {
printf("Usage: ordered_array_main <file_name>\n");
exit(EXIT_FAILURE);
}
load_array(argv[1]);
}
I know that I should store only the necessary number and words and not 100 everytime, I want to think about that later on, right now I want to fix the issue with the empty words.
Compilation and execution
gcc -o testloadfile testloadfile.c
./testloadfile "correctme.txt"
Upvotes: 0
Views: 1782
Reputation: 36082
you could instead try to use strtok
fgets(buffer,buf_size,fp);
for (char* tok = strtok(buffer,".,: "); *tok; tok = strtok(NULL,".,: "))
{
printf("%s\n", tok);
}
Note that if you want to store what strtok
returns you need to either copy the contents of what tok
points to or allocate a copy using strdup/malloc+strcpy since strtok
modifies its copy of the first argument as it parses the string.
Upvotes: 1
Reputation: 2506
You forgot to add the final '\0'
in each of a
's line, and your algorithm have many flaw (like how you increment j
each time a non-letter appear. What if you have ", "
? you increment two time instead of one).
One "easy" way is to use "strtok", as Anders K. show you.
fgets(buffer,buf_size,fp);
for (char* tok = strtok(buffer,".,:"); *tok; tok = strtok(NULL,".,:")) {
printf("%s\n", tok);
}
The "problem" of that function, is that you have to specify all the delimiter, so you have to add ' '
(space), '\t'
(tabulation) etc etc.
Since you only want "word" as described by "contain only letter, minuscule or majuscule", then you can do the following:
int main(void)
{
char line[] = "Hello ! What a beautiful day, isn't it ?";
char *beginWord = NULL;
for (size_t i = 0; line[i]; ++i) {
if (isalpha(line[i])) { // upper or lower letter ==> valid character for a word
if (!beginWord) {
// We found the beginning of a word
beginWord = line + i;
}
} else {
if (beginWord) {
// We found the end of a word
char tmp = line[i];
line[i] = '\0';
printf("'%s'\n", beginWord);
line[i] = tmp;
beginWord = NULL;
}
}
}
return (0);
}
Note that how "isn't" is splitted in "isn" and "t", since '
is not an accpeted character for your word.
The algo is pretty simple: we just loop the string, and if it's a valid letter and beginWord == NULL
, then it's the beginning of the word. If it's not a valid letter and beginWord != NULL
, then it's the end of a word. Then you can have every number of letter between two word, you still can detect cleanly the word.
Upvotes: 0