Ree
Ree

Reputation: 6211

Is modifying a string pointed to by a pointer valid?

Here's a simple example of a program that concatenates two strings.

#include <stdio.h>

void strcat(char *s, char *t);

void strcat(char *s, char *t) {
    while (*s++ != '\0');
    s--;
    while ((*s++ = *t++) != '\0');
}

int main() {
    char *s = "hello";
    strcat(s, " world");
    while (*s != '\0') {
        putchar(*s++);
    }
    return 0;
}

I'm wondering why it works. In main(), I have a pointer to the string "hello". According to the K&R book, modifying a string like that is undefined behavior. So why is the program able to modify it by appending " world"? Or is appending not considered as modifying?

Upvotes: 1

Views: 490

Answers (9)

Pete Kirkham
Pete Kirkham

Reputation: 49311

I'm wondering why it works

It doesn't. It causes a Segmentation Fault on Ubuntu x64; for code to work it shouldn't just work on your machine.

Moving the modified data to the stack gets around the data area protection in linux:

int main() {
    char b[] = "hello";
    char c[] = " ";
    char *s = b;

    strcat(s, " world");

    puts(b);
    puts(c);

    return 0;
}

Though you then are only safe as 'world' fits in the unused spaces between stack data - change b to "hello to" and linux detects the stack corruption:

*** stack smashing detected ***: bin/clobber terminated

Upvotes: 1

Christoph
Christoph

Reputation: 169603

According to the C99 specifification (C99: TC3, 6.4.5, §5), string literals are

[...] used to initialize an array of static storage duration and length just sufficient to contain the sequence. [...]

which means they have the type char [], ie modification is possible in principle. Why you shouldn't do it is explained in §6:

It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

Different string literals with the same contents may - but don't have to - be mapped to the same memory location. As the behaviour is undefined, compilers are free to put them in read-only sections in order to cleanly fail instead of introducing possibly hard to detect error sources.

Upvotes: 1

Paul Beckingham
Paul Beckingham

Reputation: 14905

It also depends on the how the pointer is declared. For example, can change ptr, and what ptr points to:

char * ptr;

Can change what ptr points to, but not ptr:

char const * ptr;

Can change ptr, but not what ptr points to:

const char * ptr;

Can't change anything:

const char const * ptr;

Upvotes: 1

Norman Ramsey
Norman Ramsey

Reputation: 202505

Perhaps surprisingly, your compiler has allocated the literal "hello" into read/write initialized data instead of read-only initialized data. Your assignment clobbers whatever is adjacent to that spot, but your program is small and simple enough that you don't see the effects. (Put it in a for loop and see if you are clobbering the " world" literal.)

It fails on Ubuntu x64 because gcc puts string literals in read-only data, and when you try to write, the hardware MMU objects.

Upvotes: 2

Renze de Waal
Renze de Waal

Reputation: 533

s points to a bit of memory that holds "hello", but was not intended to contain more than that. This means that it is very likely that you will be overwriting something else. That is very dangerous, even though it may seem to work.

Two observations:

  1. The * in *s-- is not necessary. s-- would suffice, because you only want to decrement the value.
  2. You don't need to write strcat yourself. It already exists (you probably knew that, but I'm telling you anyway:-)).

Upvotes: 0

overslacked
overslacked

Reputation: 4137

I +1'd MSN, but as for why it works, it's because nothing has come along to fill the space behind your string yet. Declare a few more variables, add some complexity, and you'll start to see some wackiness.

Upvotes: 4

Crashworks
Crashworks

Reputation: 41404

The compiler is allowing you to modify s because you have improperly marked it as non-const -- a pointer to a static string like that should be

const char *s = "hello";

With the const modifier missing, you've basically disabled the safety that prevents you from writing into memory that you shouldn't write into. C does very little to keep you from shooting yourself in the foot. In this case you got lucky and only grazed your pinky toe.

Upvotes: 0

Martin Beckett
Martin Beckett

Reputation: 96109

You were lucky this time.
Especially in debug mode some compilers will put spare memory (often filled with some obvious value) around declarations so you can find code like this.

Upvotes: 1

MSN
MSN

Reputation: 54614

Undefined behavior means a compiler can emit code that does anything. Working is a subset of undefined.

Upvotes: 19

Related Questions