Reputation: 207
I am reading a file using fscanf
. I want to ignore non alphabetic characters like commas, \ , :, and dots.
This is my code:
FILE *fp;
fp = fopen(fl,"r");
char c[50];
while(fscanf(fp, "%s" ,c)!= EOF){
linkLst(c);
}
fclose(fp);
how do I read file word by word ignoring non alphabetic characters?
{ part of the file as follows
WITH ANSWERS:
1) What written language is the most complicated in the world? (Hint: It uses four character sets.) (Is this question too easy?) >> Japanese
2) What language has a vocabulary primarily of Arabic origin (about 70%, I'm told), but uses the Roman alphabet? (I'd like to know where you found the answer!) >> Maltese
3) What non-Romance language uses a tilde (~) over the letter N? >> Estonian }
Upvotes: 1
Views: 3600
Reputation: 22412
I am going to stick to reading the content "word by word" rather than worrying about the linked list.
scanf
family of functions gives you the ability to do trivial parsing, there's no need to do it character by character as enough functionality is available in scanf. If you want to parse the strings char by char then simply use fgets
and perform the parsing you need.
I'll stick with scanf as that's what you're using:
Starting with a simple file (foo.txt) containing:
hello there, how are you?
and trying to scanf it:
/* NOTE: this code does NOT do what you want */
#include <stdio.h>
int main() {
char foo[128];
FILE *fp;
fp = fopen("foo.txt", "r");
do {
sscanf(input, "%[A-Za-z0-9]", foo);
fprintf(stderr, "foo: %s\n", foo);
} while(1);
return 0;
}
You get an infinite loop printing hello
because scanf is stuck and cannot mung the space after "hello".
So let's add a munger:
#include <stdio.h>
int main() {
char foo[128];
char mung[128];
char rv = 0;
FILE *fp;
fp = fopen("foo.txt", "r");
do {
rv = fscanf(fp, "%[A-Za-z0-9]%[^A-Za-z0-9]", foo, mung);
if(rv == EOF)
break;
fprintf(stderr, "foo: %s\n", foo);
}while(1);
}
So munger munges everything that's not in the foo character set that we're looking for (putting a ^
inside the [] as the first character makes scanf negate the content).
This will print out:
foo: hello
foo: there
foo: how
foo: are
foo: you
Now if we are being clever we can skip the assignment of the mung variable:
#include <stdio.h>
int main() {
char foo[128];
char rv = 0;
FILE *fp;
fp = fopen("foo.txt", "r");
do {
rv = fscanf(fp, "%[A-Za-z0-9]%*[^A-Za-z0-9]", foo);
if(rv == EOF)
break;
fprintf(stderr, "foo: %s\n", foo);
}while(1);
}
Obviously in my examples I've assumed that foo is less than 128 bytes. However we don't know that, well scanf (after 2001 POSIX standard) allows you go to dynamically allocate memory for character strings which you have to free
later on, so:
#include <stdio.h>
#include <stdlib.h>
int main() {
char *foo;
char rv = 0;
FILE *fp;
fp = fopen("foo.txt", "r");
do {
/* notice the & before foo, because fscanf will modify the pointer
* variable itself and assign it a new pointer after allocating
* the space for the string
*/
rv = fscanf(fp, "%m[A-Za-z0-9]%*[^A-Za-z0-9]", &foo);
if(rv == EOF)
break;
fprintf(stderr, "foo: %s\n", foo);
/* store the foo pointer somewhere for use and free it later,
* if you are sticking it in a linked list, then you should
* free it whenever you free the corrosponding node.
*
* I am just going to free it here after printing it out
*/
free(foo);
}while(1);
}
As BLUEPIXY points out this would not eat up any characters that begin with non-matching chars. So something like,
)) oops helo
will get stuck in a (null)
loop
Which means we need to make munging a separate operation so that it eats things up:
#include <stdio.h>
#include <stdlib.h>
int main() {
char *foo;
char rv = 0;
FILE *fp;
fp = fopen("foo.txt", "r");
do {
/* notice the & before foo, because fscanf will modify the pointer
* variable itself and assign it a new pointer after allocating
* the space for the string
*/
rv = fscanf(fp, "%m[A-Za-z0-9]", &foo);
if(rv == EOF)
break;
/* foo would be null if scanf didn't read anything */
if (foo) {
fprintf(stderr, "foo: %s\n", foo);
/* store the foo pointer somewhere for use and free it later,
* if you are sticking it in a linked list, then you should
* free it whenever you free the corrosponding node.
*
* I am just going to free it here after printing it out
*/
free(foo);
}
rv = fscanf(fp, %*[^A-Za-z0-9]");
if (rv == EOF)
break;
}while(1);
}
(see scanf(3)) page for details
Upvotes: 0
Reputation: 9213
You will have to create a copy of the string you read filtering out the non alphanumeric characters.
After the scanf
do this
char str[50];
int index = 0;
int index2 = 0;
while(c[index] != '\0') {
if (isalpha((unsigned char)c[index]))
str[index2++] = c[index];
else{
str[index2] = '\0';
if (index2 != 0)
linkLst(str);
index2 = 0;
}
index++;
}
str[index2] = '\0';
if (index2 != 0)
linkLst(str);
Upvotes: 1
Reputation: 1369
Include header #include <ctype.h>
which includes a function isalpha(char c)
which returns true
if c
is a alphabetic character.
if ( isalpha(c))
{
// do what you wanna do
}
else
{
// ignore
}
Otherwise, you can use the ASCII table. Use characters as if they were int
and compare them. For example, to check if a char
is a alphabetic and capital, you'd do :
if ( c < 66 && c > 90)
{
// do what you wanna do
}
else
{
// ignore
}
If c
is a tab, then loop into it like that :
for (int i = 0; i < sizeof(c); i++) // Then use c[i] to access to char inside it
{
if (c[i] < 66 && c[i] > 90) // or if (isalpha(c[i])
{
// do what you wanna do
}
else
{
// ignore
}
}
You could also write :
if (c[i] < 'A' && c[i] > 'Z')
Because thanks to ASCII table, 'A' == 66
and 'Z' == 90
Upvotes: -1
Reputation: 7441
Filter out each character with isalpha
function.
while (fscanf(fp, "%s", c) != EOF) {
char* ptr = c;
while (*ptr) {
if (isalpha(*ptr)) {
linkLst(*ptr);
}
ptr++;
}
}
And correct function linkLst
to accept character by character instead of character pointer.
If linkLst
is mandatory for char *
as parameter, you can do it like this:
while (fscanf(fp, "%s", c) != EOF) {
char* ptr = c;
char tmp[2] = {0, 0};
while (*ptr) {
if (isalpha(*ptr)) {
tmp[0] = *ptr;
linkLst(tmp);
}
ptr++;
}
}
Upvotes: -1