Reputation: 9841
In days of Turbo C/C++ you could easily modify strings like in
char * str = "Hello";
str[1] = '1'; //it will make it H1llo;
Now they are stored in .bss
and you are not allowed to modify them directly. Why? Doesn't this make modifying a string difficult?
Is there any quick fix (but without side-effects) for it? I can strdup()
like functions, but directly modifying strings was real fun.
Upvotes: 0
Views: 714
Reputation: 20392
To answer the "why" question. The trivial answer that others already mentioned is "it's undefined behavior". Which means that the standard allows the compiler to do whatever the compiler wants.
But why is it undefined behavior? The reason is that in large software projects you can often end up with thousands of duplicated strings, maybe you have a macro that crashes the program if it detects some inconsistency and prints out a message. This macro is repeated in thousands of places and prints the same message every time. If strings were modifiable, the message would have to be duplicated thousands of time in the final binary. To prevent that lots of linkers perform deduplication of strings in the binary which means that if you have printf("foo\n"); printf("foo\n");
in your program both "foo\n"
strings will be in the same memory location. Now imagine that you have: char *foo = "foo\n"; printf("foo\n");
The compiler might think that it can deduplicate the string and somewhere else in your program you do *foo = 'b';
. Now the printf will be incorrect. To prevent this, modifying string literals is undefined behavior and modern compilers will use that fact to deduplicate strings (on a much more advanced level than my examples here) and hopefully generate the program in such a way that modifying string literals will crash.
Upvotes: 2
Reputation: 106068
As you observe, you're not allowed by the Standard to modify the string literals content, and modern compiler's/OSs arrange memory permissions accordingly.
The reason is that the compiler may see a literal used in many places in the code, for example:
in x.cpp: std::cerr << "Error" << separator << msg << '\n';
in y.cpp: if (x == "Error") ...
in z.cpp: q = "StackOverflowError";
It's very desirable to avoid having all those string literals appear separately in the executable image and loaded process memory; instead, the compiler may arrange a single memory region containing "StackOverFlowError\0" and use pointers to the relevant start character (whether the 'S' or 'E') at the points of use.
If you were allowed to modify the value - perhaps deciding that you wanted x.cpp to display "Alert" instead of "Error", it could unintentionally break the code from y.cpp and z.cpp too.
Is there a quick fix?
Well, if depends what you think's broken. If you mean a way to modify the string literals, then no... that's undefined behaviour for the reasons explained above, and the memory protection mechanisms will vary with the OS etc.. If you mean to be able to modify textual data in a similar way, then yes: char* s = "abc";
puts s
on the stack, but it points to that .bss data as you've observed. If you instead write:
char s[] = "abc";
Then s
is still on the stack but is now an array with space for 4 character, the string literal is still in the .bss, but whenever that line runs it copies from the latter to the former, after which you're able to modify the stack-based copy ala s[1] = 'x';
.
Of course, putting your data into a std::string
is normally a better approach.
Upvotes: 10
Reputation: 84792
Modifying string literals is undefined behaviour under C standard.
The fact that you could do it with older compilers does not imply that it's legal in all compilers, they merely used to be more permissive and did not protect you from writing this memory area.
You can do:
char str[] = "Hello";
str[1] = '1';
This will create a mutable array of char
and initialise it with a copy of string literal's value (including a \0
terminator).
Upvotes: 3