Reputation: 371
I am trying to make a small function to get a string between two tags. But I get a segfault on str[len -3] = '\0';
Is it not possible to add a null-termination to the string passed and then send a pointer back?
Is it bad practice to change the index of the pointer instead of copying it over to a buffer and send it back?
Do I get a memory leak from the 3 bytes never getting freed?
/*
format for a message
<m>Hello world!</>13594750394883323106
<m>"msg"</><checksum>
*/
//returns the string beetween tags
char *GetMessage(char *str) {
int len = strlen(str);
for (int i = 0; i < len; i++) {
if (str[i] == '<' && str[i + 1] == 'm' && str[i + 2] == '>') {
if (str[len - 3] == '<' && str[len - 2] == '/' && str[len - 1] == '>') {
str[len - 3] = '\0';
return &str[3];
}
}
}
return NULL;
}
Upvotes: 2
Views: 172
Reputation: 144550
You are getting a crash on str[len-3] = '\0';
because you are trying to write to a read-only location. String literals used as values can be placed in read-only storage, attempting to modify them invokes undefined behavior.
You cannot pass constant strings to this function.
There is a bug in your code: return &str[3];
should be return &str[i + 3];
.
Furthermore, your code does not handle the examples in the comments because the </>
substring is not at the end.
Here is a simplified version:
#include <string.h>
char *GetMessage(char *str) {
str = strstr(str, "<m>");
if (str != NULL) {
str += 3;
char *p = strstr(str, "</>");
if (p != NULL)
*p = '\0';
}
return str;
}
Note that the pointer returned by GetMessage()
points inside the argument string. You cannot pass it to free()
to deallocate the string, this would invoke undefined behavior. Only the original value returned by malloc()
can be passed to free()
.
Upvotes: 1
Reputation: 371
To resume answers for the questions.
1) Is it not possible to add a nulltermination to the string passed and then send a pointer back?
It turned out that the string passed as a literal was the problem in this case. Jonathan Leffler pointed this out in the comments and that was exactly how I tested the function.
GetMessage("<m>Hello world!</>");
When tested in conjunction with other functions answering without literals it worked well.
GetMessage(ReadSerialData());
2) Is it bad practice to change the index of the pointer instead of copying it over to a buffer and send it back?
This appears to be of preference. But should not caus any problems.
3) I get a memory leak from the 3 bytes never getting freed?
This was really good explained here
Thanks for all the inputs!
Upvotes: 0
Reputation: 16512
To better reasoning about this, let's to draw the memory layout of your string. If I got it correctly it is something like:
111111
0123456789012345...
-> xxxxx<m>Hi</>yyy...\0
Now you want to pass a pointer to the first character of your string to GetMessage()
and print the first message. Something like
fullmsg ="....";
m = fullmsg;
m = GetMessage(m);
printf("msg: %s\n",m);
... // Advance m
Of course you can't do fullmsg=GetMessage(fullmsg)
or weird things may happen (memory leak being the least :) ).
When you've found the <m>
tag your situation is:
111111
0123456789012345...
str -> xxxxx<m>Hi</>yyy...\0
^ ^
i len
Which shows that returning str+3
does not do what you want. Your return value should be str+i+3
.
In the same vein, it's not str[len-3]
that you should put to \0
. Just imagine the effect on: GetMessage("x<m>aa</>yzyy")
. The character in position len-3
is z
. Not what you wanted, I guess.
What you could do is to use another index to find the end of the message:
for (j = i+1; j<len-2; j++) {
if (str[j] == '<' && str[j+1] == '/' && str[j+2] == '>') {
// end of message found!!!!
}
}
So that when you have found the end of the message your situation is:
111111
0123456789012345...
str -> xxxxx<m>Hi</>yyy...\0
^ ^ ^
i j len
I wish I could tell you that you could simply do str[j]='\0'
and return str+i+3 but, unfortunately I can't. If you do it and pass a literal string (
m=GetMessage("Hi there!")` you will get a coredump as the memory used for a string between quotes is read-only.
A possible solution is to change the semantic of your GetMessage()
a little bit:
// returns the length of the message if the string starts with <m>
int GetMessage(char *str) {
int len = 0;
if (str[0]=='<' && str[1]=='m' && str[2]=='>') {
str += 3;
while (str[0] != '\0') {
if (str[0]=='<' && str[1]=='/' && str[2] == '>')
return len;
str++;
}
}
return 0;
}
Now, when you want to print the message you can do something like:
fullmessage = "xxxx<m>Hi</>yyyyy";
m = fullmessage;
l = 0;
/* skip until you find a '<m>' tag */
while (m[0] != '\0' && ((l=GetMessage(m)) == 0) m++;
/* l can be 0 here if there was no message in the string */
if (l>0) printf("msg = %.*s",l,m+3);
I didn't test it fully but I hope you got the idea.
Upvotes: 3