Reputation: 51
I have been making a program to read/search for keywords. I have it all figured out except how to read for blank lines. I am currently using "" but that of course, is just reading total lines. The only other thing I could think of was "/n" for keywords, but I believe that would not work either.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[])
{
char keywords[16][10]={"","","//","/*","int","char","float","double","if","else","for","switch",";","struct","[]","{}"};
char str[1000];
int variable=0;
int character=0;
int ifStatement=0;
int forStatement=0;
int switchStatement=0;
int elseStatment=0;
int semiColon=0;
int arrays=0;
int blocks=0;
int blankLines=0;
int comments=0;
int totalLines=0;
int structs=0;
FILE * fpointer;
fpointer = fopen("test2.txt", "r");
while(!feof(fpointer))
{
fgets(str,sizeof(str),fpointer);
puts(str);
if(strstr(str,keywords[0]))
{
totalLines++;
}
if(strstr(str,keywords[1]))
{
blankLines++;
}
if(strstr(str,keywords[2]))
{
comments++;
}
if(strstr(str,keywords[3]))
{
comments++;
}
if(strstr(str,keywords[4]))
{
variable++;
}
if(strstr(str,keywords[5]))
{
variable++;
}
if(strstr(str,keywords[6]))
{
variable++;
}
if(strstr(str,keywords[7]))
{
variable++;
}
if(strstr(str,keywords[8]))
{
ifStatement++;
}
if(strstr(str,keywords[9]))
{
elseStatment++;
}
if(strstr(str,keywords[10]))
{
forStatement++;
}
if(strstr(str,keywords[11]))
{
switchStatement++;
}
if(strstr(str,keywords[12]))
{
semiColon++;
}
if(strstr(str,keywords[13]))
{
structs++;
}
if(strstr(str,keywords[14]))
{
arrays++;
}
if(strstr(str,keywords[15]))
{
blocks++;
}
}
printf("Number of total lines = %d\n",totalLines);
printf("Number of blank lines = %d\n",blankLines);
printf("Number of comments = %d\n",comments);
printf("Number of variables = %d\n",variable);
printf("Number of if statements = %d\n",ifStatement);
printf("Number of else statements = %d\n",elseStatment);
printf("Number of for statements= %d\n",forStatement);
printf("Number of switch statements = %d\n",switchStatement);
printf("Number of semi colons = %d\n",semiColon);
printf("Number of structs = %d\n",structs);
printf("Number of arrays = %d\n",arrays);
printf("Number of blocks = %d\n",blocks);
fclose(fpointer);
return 0;
}
Upvotes: 1
Views: 827
Reputation: 22478
Let's put some method in this. There are a number of unused variables and so I removed them. argc
and argv
suggest you actually want to interactively open a file, so I used that. If not, your main
should be int main (void)
to indicate you are not using the optional arguments.
Always check if the file could successfully opened, otherwise you will get unexpected errors in strange places.
Never use while (!feof(..
– fgets
itself will tell you when you reach the end of your file.
With that out of the way: one method to check for a blank line (that is, one that does not contain anything else than whitespace) is to loop over your input and check isspace
for each character. isspace
comes in ctype.h
and conveniently tells you if a character is one of C's predefined white space characters; see CPPReference for a full description.
If the loop checking for whitespace ends, my *ptr
will point to the last character it processed; if that is a 0
then you are at the end of your string and it is entirely empty, and if not, there may be something interesting after all.
To get rid of the endless list of if
s I restructured your code into a simple to loop over and easily maintainable structure, containing both a char *
to the string to search for (fixed length strings are not necessary here) and a pointer to the variable that ought to be incremented when found.
One problem that remains is that strstr
does not care whether the text is found as an entire word (for
) or as part of a longer text (forStatement
).
To tackle this, I add a boolean member asWord
and set it to 1
to enable word checking. In the check itself, I test whether the character before or after the found phrase is member of alnum()
(the set of 0..9A..Za..z
). It may or may not be valid for your purposes, but it also can easily be extended to include other characters.
Another is that each phrase will only get counted once per line. To counter that, I initialize ptr
to the start of each newly read line and increment its position whenever it finds a match, then search again. If the match was to be asWord
and it is not correct, it can skip that character (and actually the entire word; I realize now I could as well make it ptr += strlen(keywords[i].search);
so it would run marginally faster).
Your strstr
idea does not work as intended with the two sets []
and {}
! Those would only get counted if they appeared literally, like that, in the input file. So I removed the closing brackets. You can't add them as separate entries for "]"
and "}"
because then the "Number of arrays" and "Number of blocks" would be wrong.
(You can add a separate loop to count these, but what would 'number of blocks' mean if there is no }
for each {
?)
A final adjustment is to replace your puts
with a simple printf("%s")
. The line as read with fgets
will usually end with the proper line ending sequence for your platform (*), and puts
adds another one after that.
(*) Except when the input line is longer than 999 characters. Even then my print
will do the right thing and neatly join the two lines again, where puts
would separate them. If you have such long lines, your counting routine needs adjusting because a word may fall in the middle of such a split.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
int main(int argc, char *argv[])
{
char str[1000];
int variable=0;
int ifStatement=0;
int forStatement=0;
int switchStatement=0;
int elseStatement=0;
int semiColon=0;
int arrays=0;
int blocks=0;
int blankLines=0;
int comments=0;
int totalLines=0;
int structs=0;
struct {
char *search;
int *incrementMe;
int asWord;
} keywords[] = {
{ "//", &comments, 0 },
{ "/*", &comments, 0 },
{ "int", &variable, 1 },
{ "char", &variable, 1 },
{ "float", &variable, 1 },
{ "double", &variable, 1 },
{ "if", &ifStatement, 1 },
{ "else", &elseStatement, 1 },
{ "for", &forStatement, 1 },
{ "switch", &switchStatement, 1 },
{ ";", &semiColon, 0 },
{ "struct", &structs, 1 },
{ "[", &arrays, 0 },
{ "{", &blocks, 0 }
};
size_t i;
char *ptr;
FILE * fpointer;
if (argc == 1)
{
printf ("usage: count [input-file]\n");
return 0;
}
fpointer = fopen(argv[1], "r");
if (!fpointer)
{
printf ("unable to open file \"%s\"\n", argv[1]);
return -1;
}
while (1)
{
if (!fgets(str,sizeof(str),fpointer))
break;
printf ("%s", str);
totalLines++;
ptr = str;
while (*ptr)
{
if (!isspace(*ptr))
break;
ptr++;
}
if (!*ptr)
blankLines++;
else
{
for (i=0; i<sizeof(keywords)/sizeof(keywords[0]); i++)
{
ptr = str;
while ((ptr = strstr (ptr, keywords[i].search)))
{
if (keywords[i].asWord)
{
if (ptr > str && isalnum(ptr[-1]))
{
ptr++;
continue;
}
if (isalnum(ptr[strlen(keywords[i].search)]))
{
ptr++;
continue;
}
}
ptr++;
(*keywords[i].incrementMe)++;
}
}
}
}
printf("Number of total lines = %d\n",totalLines);
printf("Number of blank lines = %d\n",blankLines);
printf("Number of comments = %d\n",comments);
printf("Number of variables = %d\n",variable);
printf("Number of if statements = %d\n",ifStatement);
printf("Number of else statements = %d\n",elseStatement);
printf("Number of for statements= %d\n",forStatement);
printf("Number of switch statements = %d\n",switchStatement);
printf("Number of semi colons = %d\n",semiColon);
printf("Number of structs = %d\n",structs);
printf("Number of arrays = %d\n",arrays);
printf("Number of blocks = %d\n",blocks);
fclose(fpointer);
return 0;
}
Running this on itself ought to give this result (perhaps plus or minus a blank line, depending how you copied it):
Number of total lines = 126
Number of blank lines = 13
Number of comments = 2
Number of variables = 24
Number of if statements = 10
Number of else statements = 3
Number of for statements= 3
Number of switch statements = 2
Number of semi colons = 56
Number of structs = 2
Number of arrays = 14
Number of blocks = 28
As you can see, there is at least one major flaw left. It correctly counts occurrences of for
and skips those in forStatements
(3 times) but the word for
occurs only once as part of the program, and the other 2 are inside strings. If you are writing this to work on source code (and not on 'normal' text files), you need to take strings into account as well.
Upvotes: 1