Reputation: 3
Long story short, I'm trying to allocate memory dynamically to store the user input organized into two structs, word
and sentence
. Here are the struct definitions for them:
struct word {
char* str; // Pointer to the character array
int length; // The length of the word
int posInSentence; // Where in the sentence this word is located
};
struct sentence {
struct word* words; // Pointer to the words in this sentence
int length; // The length of the sentence
int line; // Which line this sentence is located in the collection
};
My program then calls takeInput()
which, as the name suggests, takes the user's input:
struct sentence* takeInput(int* length) {
int n = 0;
struct sentence* out = malloc(sizeof(struct sentence));
while (1) {
out = (struct sentence*) realloc(out, (n + 1) * sizeof(struct sentence));
char str[1000]; // The one execption where a defined array length is allowed
printf("Enter your string: ");
gets(str); // I know gets() is bad, but I can't be bothered to get scanf to work
if (str[0] == '\0') { // Check to see if array is empty, since that means the end of user input
break;
}
*(out + n) = splitString(str, (n + 1));
printSentence((out + n), false);
n++;
}
*length = n;
return out;
}
This in turn calls splitString()
to split the input into words and places them into a sentence:
struct sentence splitString(char* in, int line) {
struct sentence out;
char* split; // Pointer to the string tokens being split off with strtok()
char delim = ' '; // Delimiter for strtok()
int n = 0; // Used to count the number of words entered and for setting the position in sentence for each word
struct word* w = (struct word *) malloc(sizeof(struct word)); // Pointer used for the array of words
split = strtok(in, &delim); // Get the first token
w->str = split;
w->length = strlen(split);
w->posInSentence = n + 1;
n++;
while (1) {
w = (struct word *) realloc(w, (n + 1) * sizeof(struct word));
split = strtok(NULL, &delim); // Get all tokens after
if (split == NULL) {
break; // Break if we've reached the end of the string
}
(w + n)->str = split;
(w + n)->length = strlen(split);
(w + n)->posInSentence = n + 1;
n++;
}
out.words = w;
out.length = n;
out.line = line;
return out;
}
Printing out the sentences as they're entered is no issue:
Enter your string: Hello world
Hello
world
Enter your string: New Day
New
Day
The issue comes when you try to print out the list of sentences which is returned by takeInput()
. As you can see by the GDB memory, malloc is writing over the same memory (address 0x7fffffffdb60) to store both times the user inputs a line (pay attention to address 0x55555555ae0 and 0x555555559c50, they both point to the words in each sentence)
Do you have any idea on how I can fix this so that it isn't writing over the same memory?
Upvotes: 0
Views: 148
Reputation: 57388
Your problem lies here:
char str[1000]; // The one execption where a defined array length is allowed
What do you do there? You allocate a block of memory on the stack. Always the same block. Then strtok
operates on this block, and returns addresses in this block.
And you copy those addresses into *w
, but they are always addresses in the same block that gets overwritten over and over at each call of the tokenizer. The words from the very last run will be correct, all the other pointers will point to... the same words, jumbled together.
You need to make it so (w + n)->str
points to fresh memory, and to do that you can just
// Make str point to a COPY of split.
(w + n)->str = strdup(split);
As @Barmar noted, if you then deallocate your structures, you should also take care of freeing those pointers one by one (depending on the particular malloc
library, you might find it useful to free them in reverse order to ensure memory doesn't get fragmented).
Another, less wasteful possibility is to only run one malloc
, to duplicate the whole still untokenized line (then the malloc initial pointer will always be the one pointed by w[0]->str
) and then unleash strtok
on the copied string.
The memory will then look like (# are zeroes, MALLOCHDR is the malloc bookkeeping overhead)
[MALLOCHDR]This_is_a_sentence_[SLACK]
w[0]->str -^
w[1]->str ------^
w[2]->str ---------^
w[3]->str -----------^
instead of one strdup
(and therefore, malloc
) per word:
[MALLOCHDR]This_[SLACK][MALLOCHDR]is_[SLACK][MALLOCHDR]a_[SLACK}...
w[0]->str -^
w[1]->str ------------------------^
w[2]->str ---------------------------------------------^
(the SLACK bytes are due to the fact that malloc
usually actually allocates memory in chunks somewhat larger than the amount you request - I believe they're usually at least multiples of 8 bytes, maybe even 16. This kind of details is usually poorly documented and implementation dependent, but you'll want to be aware of them when you browse through 1,000 5-bytes words, and you wonder how you could occupy half a megabyte of RAM - or why after one hour of malloc'ing and free'ing the same bytes over and over, you abruptly somehow run out of memory. I've done both, more than once).
Upvotes: 4