HDHDHD
HDHDHD

Reputation: 31

Is String Literal in C really not modifiable?

As far as I know, a string literal can't be modified for example:

char* a = "abc";
a[0] = 'c';

That would not work since string literal is read-only. I can only modify it if:

char a[] = "abc";
a[0] = 'c';

However, in this post, Parse $PATH variable and save the directory names into an array of strings, the first answer modified a string literal at these two places:

path_var[j]='\0';
array[current_colon] = path_var+j+1;

I'm not very familiar with C so any explanation would be appreciated.

Upvotes: 2

Views: 1008

Answers (4)

Luis Colorado
Luis Colorado

Reputation: 12708

There are several reasons for which you had better not to modify them:

  • The first is that the operating system and/or the compiler can enforce the non-writable property of string literals, putting them in read-only memory (e.g. ROM) or in the .text segment.
  • second, the compiler is allowed to merge string literals together, so if you modify (and do it successfully) you can get surprises later because other literals (that have been merged because e.g. one of them is a suffix of the other) change apparently by no reason.
  • if you need an initialized string that is modifiable, you can do it by allocating an array with a declaration, as in (which you can freely modify):
char array[100] = "abc"; // initialized to { 'a' ,'b', 'c', '\0',
                         //         /* and 96 more '\0' characters */
                         // };

Upvotes: 0

programmer
programmer

Reputation: 679

Code blocks from the post you linked:

const char *orig_path_var = getenv("PATH"); 
char *path_var = strdup(orig_path_var ? orig_path_var : "");
const char **array;
array = malloc((nb_colons+1) * sizeof(*array));
array[0] = path_var;
array[current_colon] = path_var+j+1;

First block:

  • In the 1st line getenv() returns a pointer to a string which is pointed to by orig_path_var. The string that get_env() returns should be treated as a read-only string as the behaviour is undefined if the program attempts to modify it.
  • In the 2nd line strdup() is called to make a duplicate of this string. The way strdup() does this is by calling malloc() and allocating memory for the size of the string + 1 and then copying the string into the memory.
  • Since malloc() is used, the string is stored on the heap, this allows us to edit the string and modify it.

Second block:

  • In the 1st line we can see that array points to a an array of char * pointers. There is nb_colons+1 pointers in the array.
  • Then in the 2nd line the 0th element of array is initilized to path_var (remember it is not a string literal, but a copy of one).
  • In the 3rd line, the current_colonth element of array is set to path_var+j+1. If you don't understand pointer arithmetic, this just means it assigns the address of the j+1th char of path_var to array[current_colon].

As you can see, the code is not operating on const string literals like orig_path_var. Instead it uses a copy made with strdup(). This seems to be where your confusion stems from so take a look at this:

char *strdup(const char *s);

The strdup() function returns a pointer to a new string which is a duplicate of the string s. Memory for the new string is obtained with malloc(3), and can be freed with free(3).

The above text shows what strdup() does according to its man page.

It may also help to read the malloc() man page.

Upvotes: 1

Steve Summit
Steve Summit

Reputation: 48052

In programming, there are quite a few rules that are up to you to follow, even though they are not — necessarily — enforced. And "String literals in C are not modifiable" is one of those. So is "Strings returned by getenv should not be modified".

There are some real-world analogies that apply. Here's one: If you're at an intersection, and the light is red, you're not supposed to cross. But, much of the time, if you break the rule, and cross, you might get away with it. You might get a ticket from a policeman — or you might not. You might cause a crash — or you might not. But if you get lucky, and neither of these things happens, that does not imply that crossing the intersection against the red light was okay — it's still quite true that it was very much against the rules.

Similarly, in C, if you write some code that modifies a string literal, or a string returned from getenv, you might get away with it. The compiler might give you a warning or error message — or it might not. Your program might crash — or it might not. But if the program seems to work, that does not imply that these strings are actually modifiable — they're not.

Upvotes: 2

Kaz
Kaz

Reputation: 58647

In the example

char* a = "abc";

the token "abc" produces a literal object in the program image, and denotes an expression which yields that object's address.

In the example

char a[] = "abc";

The token "abc" is serves as an array initializer, and doesn't denote a literal object. It is equivalent to:

char a[] = { 'a', 'b', 'c', 0 };

The individual character values of "abc" are literal data is recorded somewhere and somehow in the program image, but they are not accessible as a string literal object.

The array a isn't a literal, needless to say. Modifying a doesn't constitute modifying a literal, because it isn't one.

Regarding the remark:

That would not work since string literal is read-only.

That isn't accurate. The ISO C standard (no version of it to date) doesn't specify any requirements for what happens if a program tries to modify a string literal. It is undefined behavior. If your implementation stops the program with some diagnostic message, that's because of undefined behavior, not because it is required.

C implementations are not required to support string literal modification, which has the benefits like:

  • standard-conforming C programs can be translated into images that can be be burned into ROM chips, such that their string literals are accessed directly from that ROM image without having to be copied into RAM on start-up.

  • compilers can condense the storage for string literals by taking advantage of situations when one literal is a suffix of another. The expression "string" + 2 == "ring" can yield true. Since a strictly conforming program will not do something like "ring"[0] = 'w', due to that being undefined behavior, such a program will thereby avoid falling victim to the surprise of "string" unexpectedly turning into "stwing".

Upvotes: 1

Related Questions