Reputation: 63
I'm currently attempting to write an application to count the number of occurrences of words in an ASCII file (with punctuation stripped and ignoring whitespace). The application should store the word, and the word count in a data structure, which will eventually be sorted in descending order, then printed to a CSV file.
I've made a start on this program, but I'm running into a segmentation fault when attempting to save a new word. Here's my code (I'm aware that this isn't a perfect implementation, I do plan on refining it):
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <stdlib.h>
#include <errno.h>
#define TRUE 1
#define FALSE 0
/* This program is designed to take an ASCII input file, count the occurrences of words in it
* and write an output file displaying the data. I intend for it to convert uppercase to
* lowercase, so as not to generate duplicate words in the data structure. It should also
* ignore whitespace and punctuation.
*/
void getWords(void);
void printFile(void);
void save(char *input);
struct word {
char *str;
int wc;
};
struct word *warray = NULL;
FILE *infile;
FILE *outfile;
void getWords(void)
{
rewind(infile);
char cw[100]; // Current word storage
int i = 0, j = 0, c;
while((c = fgetc(infile)) != EOF)
{
if(isalpha(c))
{
if(isupper(c))
{
cw[i] = tolower(c);
++i;
}
else
{
cw[i] = c;
++i;
}
}
else
{
if(c == '\n' || c == '\t' || c == ' ')
{
cw[i] = '\0';
i = 0;
save(cw);
for(j = 0; j < cw[99]; j++)
{
printf("%c", cw[j]);
}
}
}
}
}
void printFile(void)
{
int i, c;
printf("Printing the file to be counted in lowercase...\n");
for(i = 0; (c = fgetc(infile)) != EOF; i++)
{
if(ispunct(c) || isdigit(c))
{
++i;
}
else
{
putchar(tolower(c));
}
}
}
void save(char *input)
{
int exists = FALSE, i = 0;
int elements = sizeof(warray)/sizeof(struct word);
if(!warray)
{
warray = malloc(sizeof(struct word));
printf("Made array.\n");
}
else
{
printf("New.\n");
warray = realloc(warray, (elements++)*sizeof(struct word));
}
while(i < elements)
{
printf("in while loop\n");
if(strcmp(input, warray[i].str) == 0)
{
warray[i].wc++;
}
else
{
++i;
}
}
printf("Out while loop\n");
if(strcmp(input, warray[i].str) == 1)
{
printf("Inside save if statement\n");
warray[elements].str = malloc(strlen(input)+1);
strcpy(warray[elements].str, input);
warray[elements].wc = 1;
elements++;
}
}
int main (int argc, char *argv[])
{
if (argc < 3)
{
puts("Please supply the input filename and desired output filename as arguments.");
return 1;
}
infile = fopen(argv[1], "r");
if(infile == NULL)
{
printf("File failed to open. Error: %d\n", errno);
return 1;
}
else
{
puts("File opened successfully.");
printFile();
getWords();
}
return 0;
}
I've put in a few print statements to try and isolate the issue, and it seems to be running into the problem here, inside the save(char *input)
function:
if(strcmp(input, warray[i].str) == 1)
{
printf("Inside save if statement\n");
warray[elements].str = malloc(strlen(input)+1);
strcpy(warray[elements].str, input);
warray[elements].wc = 1;
elements++;
}
I did have a feeling it was because I'd asked strcmp to check if its value == 1, when perhaps I should just be checking for any non-zero value, but I've since tried that and I'm still getting the segmentation fault.
I'd appreciate it if anyone could point me in the right direction, and thanks in advance!
Upvotes: 3
Views: 355
Reputation: 1015
There are several logical flaws in your implementation. From your code, I assumed that you wanted to do the following:
warray
is empty. If empty, allocate a single element.But your code does the following.
if(!warray)
{
warray = malloc(sizeof(struct word));
printf("Made array.\n");
}
This part is ok.
else
{
printf("New.\n");
warray = realloc(warray, (elements++)*sizeof(struct word));
}
This should not be here. You should check for duplicate first, then allocate if needed.
while(i < elements)
{
printf("in while loop\n");
if(strcmp(input, warray[i].str) == 0)
{
warray[i].wc++;
}
else
{
++i;
}
}
This is WRONG. If the word already exists, then it will stuck in warray[i].wc++;
line. You should return after incrementing the counter.
if(strcmp(input, warray[i].str) == 1)
{
printf("Inside save if statement\n");
warray[elements].str = malloc(strlen(input)+1);
strcpy(warray[elements].str, input);
warray[elements].wc = 1;
elements++;
}
This is also WRONG. After the previous loop, value of i
will be equal to the value of elements
. But array indexes are from 0
to elements-1
. So warray[i]
and warray[elements]
both will cause segmentation fault. (you incremented value of elements
earlier on line warray = realloc(warray, (elements++)*sizeof(struct word));
)
NOTE: for(j = 0; j < cw[99]; j++)
in function getwords
may also cause segmentation fault.
EDIT: I did not noticed the post-incrementing problem earlier. It should be
warray = realloc(warray, (++elements)*sizeof(struct word));
instead of
warray = realloc(warray, (elements++)*sizeof(struct word));
Thanks to Chronos.
Upvotes: 2
Reputation: 1821
Okay, let me see if I can help. At a quick run-through, I see three obvious MAJOR issues!
First, in getWords
, in the last for-loop ("for(j = 0;...
"), the terminal condition is "j < cw[99]
"... I suspect you meant "j < 100
". We have NO idea what value will be in c[99], or if the input string was even long enough to have REACHED the last element of the array!
Second, in save
, in the first else-clause, it seems you are attempting to increase the size of warray
by one element... however, because you are POST-decrementing the variable elements
, the array is NOT resized. If you PRE-increment elements
instead, it should fix the problem.
warray = realloc(warray, (++elements)*sizeof(struct word));
Third, also in save
, it appears that your intention is to only increment the count of a word that has appeared previously...however, you have already increased the size of the array at that point, so you are using up memory resources unnecessarily.
The first two will cause your program to access memory outside the range of what is intended for your program, and could cause your system to crash, or at least very unpredictable system behavior.
There may be more, but that's should get you moving forward...
Upvotes: 2
Reputation:
One problem is the fact that you keep reallocating no words:
int elements = sizeof(warray)/sizeof(struct word);
sizeof(warray)
will be the size of a pointer, which never changes. Since sizeof(struct word)
is sizeof(pointer)+padding+sizeof(int)
, you're performing sizeof(pointer) / (sizeof(pointer)+padding+sizeof(int))
, which can be like saying 4 / (4+0+4)
, or 4/8
in the trivial case. Because of the rules of integer division, you're effectively setting elements
to 0 every time the save
function is called, and as a result, you're doing malloc(0)
, which is undefined behavior. If it returns NULL
, any line using warray[i]
can cause a segfault. It might return a non-NULL
value, but the returned pointer could point to unallocated memory.
Storing the number of elements outside of the save
function will allow you to keep track of the number of elements in the array.
Also, your realloc
line is wrong. By doing elements++
, you're saying that if the number of elements was previously 1, you should only allocate 1, and elements
is incremented sometime before the next sequence point. What you want is ++elements
, which increments the number of elements before doing the allocation (e.g. you have 1 and now you want 2).
There may be other bugs, but those were the ones I noticed.
Upvotes: 1