Reputation: 187
I am currently having some strange results when using strsep
with multiple delimiters. My delimiters include the TAB character, the space character, as well as >
and <
.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main()
{
char buffer[50];
char *curr_str = NULL;
const char delim[4] = "\t >";
//const char delim[4] = "\t ><"; // This does not work
snprintf(buffer, 50, "%s", "echo Hello");
char *str_ptr = buffer;
curr_str = strsep(&str_ptr, delim);
if (curr_str != NULL)
printf("%s\n", curr_str);
curr_str = strsep(&str_ptr, delim);
if (curr_str != NULL)
printf("%s\n", curr_str);
return (0);
}
This output is what I expect.
echo
Hello
However, as soon as I add the '<' character for the delimiter, I get
cho
Somehow, the first character gets cut off. Is there a reason behind why this is occurring?
Thank you.
Upvotes: 1
Views: 1212
Reputation: 23218
"...the first character gets cut off. is there a reason behind why this is occurring?"
Yes, undefined behavior caused by a non-null terminated char array being used in a C string function.
If when populated const char delim[4]
does not contain a null termination, it will be just a char
array, but not a C string. It may or may not exhibit Strange behavior, but it will invoke undefined behavior if used with any of the C string functions (such as curr_str = strsep(&str_ptr,delim);
).
const char delim[4];
Has room for 4 char.
"\t ><" //contains exactly 4 char
can be conceptualized like this in memory:
|\t| |>|<|?|?|?| // ? = unknown content, possibly no null termination
^end of owned memory
It should contain the following:
|\t| |>|<|\0|?|?| // null termination
^end of owned memory (5 char wide)
requiring more room in the declaration, for example one of the two following options:
const char delim[5] = "\t ><";
or
const char delim[] = "\t ><";
Upvotes: 1
Reputation: 144780
const char delim[4] = "\t ><";
does not define a proper C string because there is no space for the null terminator. Hence any non zero bytes following delim
in memory will be part of the delimiter string.
This is of course undefined behavior, and in your case the compiler may position delim
just before buffer
without any padding, effectively continuing the sequence of delimiter characters with all characters from the string "echo Hello"
. This causes the first call to strsep
to return an empty string.
You can check on this Godbolt instance that it is indeed the case in 32-bit mode, but not in 64-bit mode (remove the -m32
compiler option).
This problem is easy to fix. You can either let the compiler determine the length of the delim
array:
const char delim[] = "\t ><";
or you can use a pointer to a string constant:
const char *delim = "\t ><";
Upvotes: 1
Reputation:
The second argument to strsep
, delim
is a null-terminated string (like all strings in C), so you have to leave space for the terminating character:
const char delim[5] = "\t ><"; // This does work
//const char delim[] = "\t ><"; // or this
If you don't end the string, it will go exploring memory past the array and find many new delimiting characters to use, which is what happened in your case.
Upvotes: 3