Reputation: 1700
I wrote a binary search tree to store some sorted words. As is often the practice, I do this by allocating new block of memory for the binary tree every time a new word come in. But, strangely, I can only allocating new memory for the binary search tree twice, which means that at the first and second time everything was fine but the program crash at the third memory allocation.
Here is my code:
inputWord.c
/* I pass in the firstNode, and the word I wanna store, and its quantity as argument*/
int inputWord(BSTnode* Node,char* word,int num){
BSTnode* ptr=Node; //ptr was defined to track the location of the node.
while(1){
if(stricmp(word,ptr->data)>0){
/*If the current node already have a rightchild then ptr move to it, and do comparison again*/
if(ptr->rightchild!=NULL){
ptr=ptr->rightchild;
printf("Moving to another (right) node now!!\n");
continue;
}
/*If the current node have no rightchild, then make a new one for it and store the word and its quantity*/
else{
ptr->rightchild=malloc(sizeof(BSTnode));
if(!(ptr->rightchild))
return 1;
ptr=ptr->rightchild;
ptr->rightchild=NULL;
ptr->leftchild=NULL;
strcpy(ptr->data,word);
ptr->num=num;
break;
}
}
else if(stricmp(word,ptr->data)<0){
/*it's all the same as the rightchild part*/
if(ptr->leftchild!=NULL){
ptr=ptr->leftchild;
continue;
}
else{
ptr->leftchild=malloc(sizeof(BSTnode));
if(!(ptr->leftchild))
return 1;
ptr=ptr->leftchild;
ptr->leftchild=NULL;
ptr->rightchild=NULL;
strcpy(ptr->data,word);
ptr->num=num;
break;
}
}
/*If the word have already been stored in the tree, print out this message*/
else{
fprintf(stdout,"It is exactly the same word!!\n");
return 0;
}
}
return 0;
}
I have make some necessary comments above to help you understand my intention.Hopefully that would help.
As you can see, that function was pretty straight and simple. And it did work for the first two invokation.But it crash when invoked the third time!!(always the third time).
So I made some test. And now I am pretty sure that it crash at the line
ptr->leftchild=malloc(sizeof(BSTnode));
(make it clear that the data offirstNode
is initialize with ""
for comparison. And I pass in the word "The
" first and "Project
" second and "Gutenberg
" third. And the structure of BSTnode
is
typedef struct BSTnode{
char data[20];
struct BSTnode* leftchild;
struct BSTnode* rightchild;
int num;
}BSTnode;
)
How I make that test is listed as below. (It is the same code, only with some extra print
statement for test)
int inputWord(BSTnode* Node,char* word,int num){
printf("Enter inputWord() successfully!!\n");
BSTnode* ptr=Node;
while(1){
if(stricmp(word,ptr->data)>0){
if(ptr->rightchild!=NULL){
ptr=ptr->rightchild;
printf("Moving to another (right) node now!!\n");
continue;
}
else{
printf("I need a new rightchild!!\n");
ptr->rightchild=malloc(sizeof(BSTnode));
printf("New rightchild created successfully!!\n");
if(!(ptr->rightchild))
return 1;
ptr=ptr->rightchild;
ptr->rightchild=NULL;
ptr->leftchild=NULL;
printf("......In line 27 now!!\n");
strcpy(ptr->data,word);
printf("Copied successfully!!!..In line 29 now!!\n");
ptr->num=num;
fprintf(stdout,"New data '%s' successfully inserted into a new (right) node at %p (value of pointer)\n",word,ptr);
break;
}
}
else if(stricmp(word,ptr->data)<0){
if(ptr->leftchild!=NULL){
ptr=ptr->leftchild;
printf("Moving to another (left) node now!!\n");
continue;
}
else{
printf("I need a new left child!!!\n");
ptr->leftchild=malloc(sizeof(BSTnode));
printf("New leftchild created successfully!!\n");
if(!(ptr->leftchild))
return 1;
ptr=ptr->leftchild;
ptr->leftchild=NULL;
ptr->rightchild=NULL;
printf("......In line 47 now!!\n");
strcpy(ptr->data,word);
printf("Copied successfully!!!..In line 51 now!!\n");
ptr->num=num;
fprintf(stdout,"New data '%s' successfully inserted into a new (left) node at %p (value of pointer)\n",word,ptr);
break;
}
}
else{
fprintf(stdout,"Nothing else to insert!!\n");
return 0;
}
}
return 0;
}
As you can see, with some print
statements telling me where have I been, I can be sure where the program crash.
Any idea why it always crash at the third time?
#######################################################################3main.c
#include<stdlib.h>
#include<stdio.h>
#include<string.h>
#include<stdbool.h>
#include "wordCount.h"
void prompt(BSTnode*,FILE*);
char arr[20]={0};
int main()
{
BSTnode* firstNode=malloc(sizeof(BSTnode));
firstNode->leftchild=NULL;
firstNode->rightchild=NULL;
strcpy(firstNode->data,"");
firstNode->num=0;
FILE* fs=fopen("testfile.txt","r");
if(!fs){
printf("Failed to open fiel!!\n");
return 2;
}
while(1){
if(ferror(fs))
perror("there is a error in fs in the beginning of while loop!\n");
prompt(firstNode,fs);
}
return 0;
}
void prompt(BSTnode* Node,FILE* fs){
int i=0;
printf("Please select\n1.find and input a word into the binary tree\n2.print only one data\n3.Exit\n");
if(scanf("%d",&i)!=1){
printf("scanf failed!!\nplease input a valid number!!\n");
//fflush(stdin);
return;
}
getchar();
switch(i){
case 1:
{
memset(arr,'\0',20); //since the "arr" is used to hold the newWord founded and returned, it should be clear first every time
char* newWord=findWord(fs);
int totalNumberOfTheWord=wordCount(fs,newWord);
inputWord(Node,newWord,totalNumberOfTheWord);
break;
}
case 2:
printOneNode(Node);
break;
case 3:
exit(0);
default:
printf("Please input a valid number!(1-3)");
}
}
Also, the wordCount.h:
#ifndef WORDCOUNT_H
#define WORDCOUNT_H
#include<stdlib.h>
#include<stdio.h>
typedef struct BSTnode{
char data[20];
struct BSTnode* leftchild; //if less than, put it on the left
struct BSTnode* rightchild; //if greater than, on the right
int num;
}BSTnode;
int inputWord(BSTnode*,char*,int);
char* findWord(FILE*);
int wordCount(FILE*,char*);
int printOneNode(BSTnode*);
#endif
The function prompt()
is used to prompt the user to decide whether to continue word-searching.
full source code:
wordCount.c
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
#include "wordCount.h"
int wordCount(FILE* fs,char* word)
{
int num=0;
rewind(fs);
size_t n1=sizeof(word);
size_t n2=strlen(word);
char* buff=malloc(n1) ;
if(buff==NULL)
return 1;
memset(buff,'\0',n1);
/* I count the word by moving byte by byte and do comparison*/
if (fs != NULL) {
if (n2 == fread(buff, 1,n2, fs)) {
do {
if (strnicmp(buff,word,n2) == 0)
num++;
memmove(buff, buff+1,n2-1);
} while (1 == fread(buff+n2-1, 1, 1, fs));
// I think I might optimize
// this using KMP algorithm
}
}
free(buff);
return num;
}
findWord.c
#include<string.h>
#include<stdio.h>
#include<stdbool.h>
#include<stdlib.h>
#include "wordCount.h"
extern char arr[20];
char* findWord(FILE* fs)
{
static long pos=0;
fseek(fs,pos,SEEK_SET);
if(ferror(fs)){
perror("fseek() failed!!!\n");
fprintf(stderr,"fseek() failed in file %s\n",__FILE__);
exit(EXIT_FAILURE);
}
char chr[1]={0};
bool flag1=false;
bool flag2=false;
while((1==fread(chr,1,1,fs))&&(!(flag1==false&&flag2==true))){
// This would make the findword() function
// find only a single word once
if(chr[0]!=32){
strncat(arr,chr,1);
flag2=true;
flag1=true;
}
else
flag1=false;
}
/*the key method that I use to find a new word is that I use two 'bool' flags: flag1 and flag2.
*Only when the "arr" is filled only with character, not a single space, will the flag1 be false and flag2 be true, thus breaking the while loop*/
pos=ftell(fs)-1;
//maybe everytime you use "fseek()", "ftell()", the
//file-position will move one byte ahead.
return arr;
}
printOneNode.c
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include"wordCount.h"
int printOneNode(BSTnode* Node){
BSTnode* ptr=Node;
while(1){
printf("Select which side of node do you want to print now(l/r)?(q for quit) ");
char a;
getchar(); //this is used to consume the newline character left
//fflush(stdin);
if(scanf("%c",&a)!=1){
printf("scanf failed!!");
return 1;
}
switch(a){
case 'l':
{
if(ptr->leftchild!=NULL){
ptr=ptr->leftchild;
printf("\t%s\n",ptr->data);
}
else
printf("There is no more leftchild\n");
break;
}
case 'r':
{
if(ptr->rightchild!=NULL){
ptr=ptr->rightchild;
printf("\t%s\n",ptr->data);
}
else
printf("There is no more rightchild!\n");
break;
}
case 'q':
return 0;
default:
return 0;
}
}
}
The function findWord()
is used to find a new word for insertion. For example, if there is string This is a lovely place...
in the textfile.txt
, then the findWord()
would first find out a word This
and then is
secondly and then a
thirdly, etc. (This is the reason why I define the pos
as a static variable to keep track of the location.)
The function wordCount()
is used to count out how many time those the word returned by findWord()
appear in the testfile.txt
.
The function printOneNode()
is used to print out the data of one single node according to the user's willingness. I designed this function but haven't use it yet, which mean that in the prompt()
function I always choose to "find and input a new word into the binary search tree"). So this may not the reason that cause my program to crash "occasionally".
As summary, my routine is:
testfile.txt
using findWord()
wordCount()
inputWord()
Repeat that.
I cannot make this program smaller any more to make it more understandable, because it have to find a word and count it insert it. But you can ignore that printOneNode()
function, to some extent.
As for the testfile.txt
, I have posted the link below at the comment area. Thanks
Upvotes: 1
Views: 261
Reputation: 15642
edit: This is an amendment to my previous post (found below), detailing the more severe issues found in this code.
In wordCount
there is a buffer overflow. Buffer overflows are UB.
n1
bytes for buff
to point at. By chance, do you happen to know how many bytes that is? Perhaps you should check, and then answer this to yourself: How many bytes can you store in that object?n2
bytes into buff
. Which is greater, n1
or n2
? Have you looked at that? What happens if you try to fit 24 eggs into a carton that only holds 12?I think the problem here is that you don't understand the sizeof
operator; it isn't a function... Rather, it is an operator much like the &address-of
and the -negation
operator, except that sizeof
operates on the type of (or denoted by) an expression; it evaluates to the size of objects of that type.
To clarify, in the following fragment of code, n1
is sizeof (char *)
, which is probably not what you intended.
int wordCount(FILE* fs,char* word)
{
int num=0;
rewind(fs);
size_t n1=sizeof(word);
size_t n2=strlen(word);
char* buff=malloc(n1);
inputWord
seems to operate under the impression that word
points to a string, however that value seems to come from findWord
in your program, which doesn't necessary produce a string (because it uses strncat
). More undefined behaviour! Is this surprising?
Previous answer:
Firstly, this code doesn't even compile. You're missing a semicolon immediately following inputWord(Node,newWord,totalNumberOfTheWord)
within prompt
. Perhaps you haven't noticed the errors, and you're running an out-of-date binary which we don't have the source code for?
Secondly, even if this code were to compile, there are a number of instances of undefined behaviour such as:
malloc
returns NULL
and you attempt to modify NULL
points toBSTnode* firstNode=malloc(sizeof(BSTnode));
followed immediately by firstNode->leftchild=NULL;
. Perhaps you could declare firstNode
like so: BSTnode firstNode = { 0 };
and create a pointer to it using &firstNode
... After all, you really should choose the most appropriate storage duration rather than defaulting to malloc
every time. On that note, I highly recommend separating your allocation logic from your data structure logic; if you need further elaboration, consider how scanf
is designed.fflush(stdin);
. Whenever you use a function for the first time, you should always read and understand the manual very carefully... and that's not just to provide insight on how you should be designing your functions. If you had read and fully understood this fflush
manual prior to using fflush
, you would have never used this problematic code. Consider using something like scanf("%*[^\n]"); getchar();
in its place.%p
format directive, which expects a void *
pointer as a corresponding argument. The corresponding argument you're providing, however, is of type struct BSTnode *
. According to the fprintf
manual, "If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined."Even if you don't fix these undefined behaviours, this code may coincidentally work on your system when you provide dummy functions in place of findWord
and wordCount
. However, it's not required to work the same way on all systems, which means for you the crash might occur where for us it doesn't. Fix those problems.
Those problems indicate that your findWord
and wordCount
functions aren't necessarily trustworthy and foolproof, either; they might work for you in one setting whilst failing for you in another, or worse yet, perhaps they're stale too! You should have verified that the problem is where you think it is by providing dummy functions in their places. That is, after all, part of the process of creating an MCVE so that your question doesn't get closed.
No, I won't be interested in starting a bounty on this question because it's of extremely poor quality; as I previously mentioned, this question relies upon syntactically erroneous code compiling correctly, so we can't reproduce the result you see. Even if we fix the syntax errors, we'd have to fill in the blanks (that's your work) which introduces an aspect of uncertainty into any possible answers. About the only thing I am interested in starting for this question is the process of having it closed.
Upvotes: 4