Reputation: 63
I have a string, and in it I need to find a substring and replace it. The one to be found and the one that'll replace it are of different length. My code, partially:
char *source_str = "aaa bbb CcCc dddd kkkk xxx yyyy";
char *pattern = "cccc";
char *new_sub_s = "mmmmm4343afdsafd";
char *sub_s1 = strcasestr(source_str, pattern);
printf("sub_s1: %s\r\n", sub_s1);
printf("sub_str before pattern: %s\r\n", sub_s1 - source_str); // Memory corruption
char *new_str = (char *)malloc(strlen(source_str) - strlen(pattern) + strlen(new_sub_s) + 1);
strcat(new_str, '\0');
strcat(new_str, "??? part before pattern ???");
strcat(new_str, new_sub_s);
strcat(new_str, "??? part after pattern ???");
Why do I have memory corruption?
How do I effective extract and replace pattern
with new_sub_s
?
Upvotes: 2
Views: 184
Reputation: 8286
snprintf
can be used to calculate the memory needed and then print the string to the allocated pointer.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main ( void) {
char *source_str = "aaa bbb CcCc dddd kkkk xxx yyyy";
char *pattern = "cccc";
char *new_sub_s = "mmmmm4343afdsafd";
char *sub_s1 = strcasestr(source_str, pattern);
int span = (int)( sub_s1 - source_str);
char *tail = sub_s1 + strlen ( pattern);
size_t size = snprintf ( NULL, 0, "%.*s%s%s", span, source_str, new_sub_s, tail);
char *new_str = malloc( size + 1);
snprintf ( new_str, size, "%.*s%s%s", span, source_str, new_sub_s, tail);
printf ( "%s\n", new_str);
free ( new_str);
return 0;
}
Upvotes: 1
Reputation: 144780
There are multiple problems in your code:
sub_s1
was found in the string. What if there is no match?printf("sub_str before pattern: %s\r\n", sub_s1 - source_str);
passes a difference of pointers for %s
that expects a string. The behavior is undefined.strcat(new_str, '\0');
has undefined behavior because the destination string is uninitialized and you pass a null pointer as the string to concatenate. strcat
expects a string pointer as its second argument, not a char
, and '\0'
is a character constant with type int
(in C) and value 0
, which the compiler will convert to a null pointer, with or without a warning. You probably meant to write *new_str = '\0';
You cannot compose the new string with strcat
as posted: because the string before the match is not a C string, it is a fragment of a C string. You should instead determine the lengths of the different parts of the source string and use memcpy
to copy fragments with explicit lengths.
Here is an example:
char *patch_string(const char *source_str, const char *pattern, const char *replacement) {
char *match = strcasestr(source_str, pattern);
if (match != NULL) {
size_t len = strlen(source_str);
size_t n1 = match - source_str; // # bytes before the match
size_t n2 = strlen(pattern); // # bytes in the pattern string
size_t n3 = strlen(replacement); // # bytes in the replacement string
size_t n4 = len - n1 - n2; // # bytes after the pattern in the source string
char *result = malloc(n1 + n3 + n4 + 1);
if (result != NULL) {
// copy the initial portion
memcpy(result, source_str, n1);
// copy the replacement string
memcpy(result + n1, replacement, n3);
// copy the trailing bytes, including the null terminator
memcpy(result + n1 + n3, match + n2, n4 + 1);
}
return result;
} else {
return strdup(source_str); // always return an allocated string
}
}
Note that the above code assumes that the match in the source string has be same length as the pattern string (in the example, strings "cccc"
an "CcCc"
have the same length). Given that strcasestr
is expected to perform a case independent search, which is confirmed by the example strings in the question, it might be possible that this assumption fail, for example if the encoding of upper and lower case letters have a different length, or if accents are matched by strcasestr
as would be expected in French: "é"
and "E"
should match but have a different length when encoded in UTF-8. If strcasestr
has this advanced behavior, it is not possible to determine the length of the matched portion of the source string without a more elaborate API.
Upvotes: 3
Reputation: 107769
printf("sub_str before pattern: %s\r\n", sub_s1 - source_str); // Memory corruption
You're taking the difference of two pointers, and printing it as though it was a pointer to a string. In practice, on your machine, this probably calculates a meaningless number and interprets it as a memory address. Since this is a small number, when interpreted as an address, on your system, this probably points to unmapped memory, so your program crashes. Depending on the platform, on the compiler, on optimization settings, on what else there is in your program, and on the phase of the Moon, anything could happen. It's undefined behavior.
Any half-decent compiler would tell you that there's a type mismatch between the %s
directive and the argument. Turn those warnings on. For example, with GCC:
gcc -Wall -Wextra -Werror -O my_program.c
char *new_str = (char *)malloc(…); strcat(new_str, '\0'); strcat(new_str, "…");
The first call to strcat
attempts to append '\0'
. This is a character, not a string. It happens that since this is the character 0, and C doesn't distinguish between characters and numbers, this is just a weird way of writing the integer 0
. And any integer constant with the value 0 is a valid way of writing a null pointer constant. So strcat(new_str, '\0')
is equivalent to strcat(new_str, NULL)
which will probably crash due to attempting to dereference the null pointer. Depending on the compiler optimizations, it's possible that the compiler will think that this block of code is never executed, since it's attempting to dereference a null pointer, and this is undefined behavior: as far as the compiler is concerned, this can't happen. This is a case where you can plausibly expect that the undefined behavior causes the compiler to do something that looks preposterous, but makes perfect sense from the way the compiler sees the program.
Even if you'd written strcat(new_str, "\0")
as you probably intended, that would be pointless. Note that "\0"
is a pointless way of writing ""
: there's always a null terminator at the end of a string literal¹. And appending an empty string to a string wouldn't change it.
And there's another problem with the strcat
calls. At this point, the content of new_str
is not initialized. But strcat
(if called correctly, even for strcat(new_str, "")
, if the compiler doesn't optimize this away) will explore this uninitialized memory and look for the first null byte. Because the memory is uninitialized, there's no guarantee that there is a null byte in the allocated memory, so strcat
may attempt to read from an unmapped address when it runs out of buffer, or it may corrupt whatever. Or it may make demons fly out of your nose: once again it's undefined behavior.
Before you do anything with the newly allocated memory area, make it contain the empty string: set the first character to 0. And before that, check that malloc
succeeded. It will always succeed in your toy program, but not in the real world.
char *new_str = malloc(…);
if (new_str == NULL) {
return NULL; // or whatever you want to do to handle the error
}
new_str[0] = 0;
strcat(new_str, …);
¹ The only time there isn't a null pointer at the end of a "…"
is when you use this to initialize an array and the characters that are spelled out fill the whole array without leaving room for a null terminator.
Upvotes: 1