Reputation: 841
I started my adventure with C++ one week back. I have read a lot about C++. I was experimenting with the following:
char * String1 = "abcdefgh";
I, then, tried to modify its value in the following way:
String1[2] = 'f';
This resulted in an UNHANDLED EXCEPTION. But the following results in proper execution:
char String2[9]="abcdefgh";
String2[7]='s';
I tried to extract information about the binary generated using above code using DUMPBIN. DUMPBIN is a Visual Studio Tool. I used the /ALL option to extract every information contained in the binary.
I could see two instances of "abcdefgh" in the RAWDATA section. And I understand why.
My questions are as follows:
1) Although both String1 and String2 are essentially pointers to two different instances of the same character sequence, why is the String1 manipulation not a legal one?
2) I know the compiler generates a SYMBOL TABLE for mapping variable names and their values. IS there any tool to visualize the SYMBOL TABLE in Windows OS?
3) If I have an array of integers instead of the character sequence, can it be found in the RAWDATA?
I could also see the following in RAWDATA:
Unknown Runtime Check Error.........
Stack memory around _alloca was corrupted.......
....A local variable was used before it was initialized.........
....Stack memory was corrupted..
........A cast to a smaller data type has caused a loss of data.
If this was intentional, you should mask the source of the cast with the appropriate bitmask.
How do these things get into the binary executable? What is the purpose of having these messages in the binary(which obviously is not readable)?
EDIT: My question 1) has a word INSTANCES, which is used to mean the following:
The character sequence "abcdefgh" is derived from a set of non-capitalized ENGLISH ALPHABETS, i.e., {a,b,...,y,z}. This sequence is INSTANCIATED twice and stored at two memory locations, say A and B. String1, points to A(assumption) and String2 points to B. There is no conceptual mix-up in the question.
What I wanted to comprehend was the difference in the attributes of the memory locations A and B, i.e., why one of them was immutable.
Upvotes: 0
Views: 938
Reputation: 98425
Note: all of the code below refers to a scope within a function.
The code below initializes a writeable buffer string2
with data. The compiler generates initialization code to copy from the read-only compiler generated string to this buffer.
char string2[] = "abcdefgh";
The code below stores a pointer to a read-only, compiler-generated string in string1
. The string's contents are in a read-only section of the executable image. That's why modifying it will fail.
char * string1 = "abcdefgh";
You can make it work by having string1
point to a writeable buffer. This can be achieved by copying the string:
char * string1 = strdup("abcdefgh");
....
free(string1); // don't forget to free the buffer!
Upvotes: 5
Reputation: 153929
You cannot modify a string literal. The type of a string literal is
char const[]
, and any attempt to modify one is undefined behavior.
And given a statement like:
char* s1 = "a litteral";
, the compiler really should generate a warning. The implicit
conversion to non-const here is deprecated, and was only introduced into
the language to avoid breaking existing code (dating from an epoch when
C didn't have const
).
In the case:
char s2[] = "init";
, there isn't really a string literal. The "string literal" is in fact an
initialization specification, and unlike string literals, doesn't appear
anywhere in memory; it is used by the compiler to determine how s2
should be initialized, and is the exact the equivalent of:
char s2[] = { 'i', 'n', 'i', 't', '\0' };
(It is a bit more convenient to write.)
--
A short historical sidelight: early C didn't have const
. The type of
a string literal was char[]
, and modifying it was legal. This lead
to some very horrible code:
char* f() { return "abcd"; }
/* ... */
f()[1] = 'x';
and the next time you called f
, it returned "axcd"
. A litteral
which doesn't have the value which appears in the source listing is
not the way to readable code, and the C standards committee decided
that this was one feature it was better not to keep.
Upvotes: 2
Reputation: 16039
1) As pointed in the c++ standard (2003) (http://www.iso.org/iso/catalogue_detail.htm?csnumber=38110)
1 A string literal is a sequence of characters surrounded by
double quotes, optionally beginning with the letter L, as in "..." or L"...". A string literal that does not begin with L is an ordinary string literal, also referred to as a narrow string literal. An ordinary string literal has type "array of n const
char" and static storage duration (basic.stc), where n is the size of the string as defined below, and is initialized with the given characters. A string literal that begins with L, such as L"asdf", is a wide string literal. A wide string literal has type "array of n const wchar_t" and has static storage duration, where n is the size of the string as defined below, and is initialized with the given charac- ters.2 Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation-defined. The effect of attempting to modify a string literal is undefined.
As stated above, it's not illegal, is undefined behavior, so, with VS you get an exception on windows, with g++ you will get a segmentation fault in linux (basically they look alike though)
2) You can use a Disassembly program and check for the data section of the exe file (check this wiki for more info about several exe file structures x86 Disassembly/Windows Executable Files)
3) Yes, it should be in the .data section of the exe file
Upvotes: 1
Reputation: 14086
char string[] = "foo"
This allocates a char array, and initializes it with the values {'f', 'o', 'o', '\0'}. You get "your own" storage for the chars, and you can modify the array.
char strptr* = "foo"
This allocates a pointer, and sets the value of that pointer to the address of a char array which contains {'f', 'o', 'o', '\0'}. The pointer is yours to do with as you wish, but the char array is not. In fact, the type of the array is not char[]
, but const char[]
, and strptr
really ought to be declared as const char*
so that you do not mistakenly attempt to modify the const array.
In the first case, "foo"
is an array initializer. In the second, "foo"
is a string literal.
More specific details about exactly where the memory for each situation is located tend to be unspecified by the standard. However, generally speaking, char string[] = "foo"
allocates a char
array on the stack, char strptr* = "foo"
allocates a char
pointer on the stack and (statically) allocates a const char
array in the data section of the executable.
Upvotes: 1
Reputation: 96109
char * String1 = "abcdefgh";
In C (and C++) is const, the compiler is allowed to store fixed const data however it likes, it may have a separate DATA segment, it might have completely const program store (in a Harvard architecture)
char String2[9]="abcdefgh";
Allocates a 9 element arrays of chars and just happens to initialise it with some string. You can do what you want with the array. Arrays of any other type would be stored in the same way.
The error messages for some runtime errors are stored in the program data segment(in the same way as your original char* string). Some of them like "this program needs windows" must obviously be in there rather than in the OS because DOS wouldn't know a program needed a later version of Windows. But I'm not sure why these particular runtime errors aren't created by the OS
Upvotes: 4