Jayamal Jayamaha
Jayamal Jayamaha

Reputation: 207

How to ignore non alphabetic characters in c when read file

I am reading a file using fscanf. I want to ignore non alphabetic characters like commas, \ , :, and dots.

This is my code:

FILE *fp;
fp = fopen(fl,"r");
char c[50];

while(fscanf(fp, "%s" ,c)!= EOF){
    linkLst(c);

}

fclose(fp);

how do I read file word by word ignoring non alphabetic characters?

{ part of the file as follows

WITH ANSWERS:

1) What written language is the most complicated in the world? (Hint: It uses four character sets.) (Is this question too easy?) >> Japanese

2) What language has a vocabulary primarily of Arabic origin (about 70%, I'm told), but uses the Roman alphabet? (I'd like to know where you found the answer!) >> Maltese

3) What non-Romance language uses a tilde (~) over the letter N? >> Estonian }

Upvotes: 1

Views: 3600

Answers (4)

Ahmed Masud
Ahmed Masud

Reputation: 22412

I am going to stick to reading the content "word by word" rather than worrying about the linked list.

scanf family of functions gives you the ability to do trivial parsing, there's no need to do it character by character as enough functionality is available in scanf. If you want to parse the strings char by char then simply use fgets and perform the parsing you need.

I'll stick with scanf as that's what you're using:

Starting with a simple file (foo.txt) containing:

hello there, how are you?

and trying to scanf it:

bad example 1:

/* NOTE: this code does NOT do what you want */
#include <stdio.h>

int main() {
    char foo[128];
    FILE *fp;
    fp = fopen("foo.txt", "r");
    do {
        sscanf(input, "%[A-Za-z0-9]", foo);
        fprintf(stderr, "foo: %s\n", foo);
    } while(1);
    return 0;
}

You get an infinite loop printing hello because scanf is stuck and cannot mung the space after "hello".

So let's add a munger:

#include <stdio.h>

int main() {
    char foo[128];
    char mung[128];
    char rv = 0;
    FILE *fp;
    fp = fopen("foo.txt", "r");
    do {
        rv = fscanf(fp, "%[A-Za-z0-9]%[^A-Za-z0-9]", foo, mung);
        if(rv == EOF)
            break;
        fprintf(stderr, "foo: %s\n", foo);
    }while(1);
}

So munger munges everything that's not in the foo character set that we're looking for (putting a ^ inside the [] as the first character makes scanf negate the content).

This will print out:

 foo: hello
 foo: there
 foo: how
 foo: are
 foo: you

Now if we are being clever we can skip the assignment of the mung variable:

#include <stdio.h>

int main() {
    char foo[128];
    char rv = 0;
    FILE *fp;
    fp = fopen("foo.txt", "r");
    do {
        rv = fscanf(fp, "%[A-Za-z0-9]%*[^A-Za-z0-9]", foo);
        if(rv == EOF)
            break;
        fprintf(stderr, "foo: %s\n", foo);
    }while(1);
}

Obviously in my examples I've assumed that foo is less than 128 bytes. However we don't know that, well scanf (after 2001 POSIX standard) allows you go to dynamically allocate memory for character strings which you have to free later on, so:

#include <stdio.h>
#include <stdlib.h>

int main() {
    char *foo;
    char rv = 0;
    FILE *fp;
    fp = fopen("foo.txt", "r");
    do {
        /* notice the & before foo, because fscanf will modify the pointer
         * variable itself and assign it a new pointer after allocating 
         * the space for the string
         */
        rv = fscanf(fp, "%m[A-Za-z0-9]%*[^A-Za-z0-9]", &foo);

        if(rv == EOF)
            break;

        fprintf(stderr, "foo: %s\n", foo);
        /* store the foo pointer somewhere for use and free it later,
         * if you are sticking it in a linked list, then you should 
         * free it whenever you free the corrosponding node.
         *
         * I am just going to free it here after printing it out
         */
        free(foo); 
    }while(1);
}

Update

As BLUEPIXY points out this would not eat up any characters that begin with non-matching chars. So something like,

)) oops helo

will get stuck in a (null) loop

Which means we need to make munging a separate operation so that it eats things up:

#include <stdio.h>
#include <stdlib.h>

int main() {
    char *foo;
    char rv = 0;
    FILE *fp;
    fp = fopen("foo.txt", "r");
    do {
        /* notice the & before foo, because fscanf will modify the pointer
         * variable itself and assign it a new pointer after allocating 
         * the space for the string
         */
        rv = fscanf(fp, "%m[A-Za-z0-9]", &foo);

        if(rv == EOF)
            break;

        /* foo would be null if scanf didn't read anything */
        if (foo) {
             fprintf(stderr, "foo: %s\n", foo);
        /* store the foo pointer somewhere for use and free it later,
         * if you are sticking it in a linked list, then you should 
         * free it whenever you free the corrosponding node.
         *
         * I am just going to free it here after printing it out
         */
             free(foo); 
        }


        rv = fscanf(fp, %*[^A-Za-z0-9]");

        if (rv == EOF) 
             break;

    }while(1);
}

(see scanf(3)) page for details

Upvotes: 0

Ajay Brahmakshatriya
Ajay Brahmakshatriya

Reputation: 9213

You will have to create a copy of the string you read filtering out the non alphanumeric characters.

After the scanf do this

char str[50];
int index = 0;
int index2 = 0;
while(c[index] != '\0') {
    if (isalpha((unsigned char)c[index]))
        str[index2++] = c[index];
    else{
        str[index2] = '\0'; 
        if (index2 != 0)
            linkLst(str);
        index2 = 0; 
    }    
    index++;    
}
str[index2] = '\0';
if (index2 != 0)
    linkLst(str);

Upvotes: 1

Badda
Badda

Reputation: 1369

Include header #include <ctype.h> which includes a function isalpha(char c) which returns true if c is a alphabetic character.

if ( isalpha(c))
{
     // do what you wanna do
}
else 
{
     // ignore
}

Otherwise, you can use the ASCII table. Use characters as if they were int and compare them. For example, to check if a char is a alphabetic and capital, you'd do :

if ( c < 66 && c > 90)
{
    // do what you wanna do
}
else 
{
    // ignore
}

If c is a tab, then loop into it like that :

for (int i = 0; i < sizeof(c); i++) // Then use c[i] to access to char inside it
{
    if (c[i] < 66 && c[i] > 90) // or if (isalpha(c[i])
    {
        // do what you wanna do
    }
    else 
    {
       // ignore
    }
}

You could also write :

if (c[i] < 'A' && c[i] > 'Z') 

Because thanks to ASCII table, 'A' == 66 and 'Z' == 90

Upvotes: -1

unalignedmemoryaccess
unalignedmemoryaccess

Reputation: 7441

Filter out each character with isalpha function.

while (fscanf(fp, "%s", c) != EOF) {
    char* ptr = c;
    while (*ptr) {
        if (isalpha(*ptr)) {
            linkLst(*ptr);
        }
        ptr++;
    }
}

And correct function linkLst to accept character by character instead of character pointer.

If linkLst is mandatory for char * as parameter, you can do it like this:

while (fscanf(fp, "%s", c) != EOF) {
    char* ptr = c;
    char tmp[2] = {0, 0};
    while (*ptr) {
        if (isalpha(*ptr)) {
            tmp[0] = *ptr; 
            linkLst(tmp);
        }
        ptr++;
    }
}

Upvotes: -1

Related Questions