Curnelious
Curnelious

Reputation: 1

C array initializing - some basics

I'v read in a book that when you have a string like this "blabla", it means there is a hidden char array, and this expression returns the address to the first element, and it's like a const array .

This makes me confused about 2 scenarios :

  1. char a[7] = "blabla" , is not possible because the "blabla" returns an address to the first element of an array, so how would you put an address into a instead of actual elements ?

  2. it says when you see "blabla" it means like a const char array , and that means I can't change a, at all (which is not true).

I guess something really basic here is unclear to me.

Upvotes: 2

Views: 86

Answers (4)

alinsoar
alinsoar

Reputation: 15803

too late but I still provide my answer.

So let us make the difference between

main()
{
  char *a="blabla";
  a[3]='x';
}

and this one, yours.

main() 
{
  char a[7] = "blabla"
  a[3]='x';
}

So there is a big difference between them.

In the first case the object a is a pointer whose value points to the beginning of the blabla string.

Dumping the assembled code, we see:

  4004aa:       48 c7 45 f8 54 05 40    movq   $0x400554,-0x8(%rbp)
  4004b1:       00 
  4004b2:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4004b6:       48 83 c0 03             add    $0x3,%rax
  4004ba:       c6 00 78                movb   $0x78,(%rax)

So, it tries and set the pointer toward the address 0x400554.

Objdumpo reports that this address is in the .rodata segment.

Disassembly of section .rodata:

0000000000400550 <_IO_stdin_used>:
  400550:       01 00                   add    %eax,(%rax)
  400552:       02 00                   add    (%rax),%al
  400554:       62                      (bad)  
  400555:       6c                      insb   (%dx),%es:(%rdi)
  400556:       61                      (bad)  
  400557:       62                      .byte 0x62
  400558:       6c                      insb   (%dx),%es:(%rdi)
  400559:       61                      (bad)  

So, the compiler installed the string blabla in .rodata at that address and after that it tries to modify the .rodata segment, finishing with segmentation fault.

readelf reports no W access on .rodata:

[13] .rodata           PROGBITS         0000000000400550  00000550
     000000000000000b  0000000000000000   A       0     0     4

On the other hand, what you try to do (the 2nd program) is compiled so:

00000000004004a6 <main>:
  4004a6:       55                      push   %rbp
  4004a7:       48 89 e5                mov    %rsp,%rbp
  4004aa:       c7 45 f0 62 6c 61 62    movl   $0x62616c62,-0x10(%rbp)
  4004b1:       66 c7 45 f4 6c 61       movw   $0x616c,-0xc(%rbp)
  4004b7:       c6 45 f6 00             movb   $0x0,-0xa(%rbp)
  4004bb:       c6 45 f3 78             movb   $0x78,-0xd(%rbp)

In this case, the array object a is allocated 7 bytes on the stack frame, starting from offset %RBP-0xA up to %RBP-0x10.

When it tries to do a[3]='x' it will modify the stack at %RBP-0xD. The stack has write permission, all is all right.

For more information I suggest you to read https://en.wikipedia.org/wiki/Identity_and_change

Upvotes: 1

Vlad from Moscow
Vlad from Moscow

Reputation: 311048

According to the C Standard (6.3.2.1 Lvalues, arrays, and function designators)

3 Except when it is the operand of the sizeof operator or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.

So in this declaration

char a[7] = "blabla";

the elements of the string literal that has the type of character array char[7] due to including the terminating zero as an element of the string literal are used to initialize the elements of the character array a

In fact this declaration is equivalent to the declaration

char a[7] = { 'b', 'l', 'a', 'b', 'l', 'a', '\0' };

Take into account that in C string literals have types of non-constant character arrays. Nevertheless they themselves may not be modifiable.

From the C Standard (6.4.5 String literals)

7 It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

So you may write for example

char *s = "blabla";

In this case according to the first quote from the C standard the string literal is converted to pointer to its first element and the value of the pointer is assigned to the variable s.

That is in the static memory there is created an unnamed character array and the address of the first element of the array is assigned to the pointer s. You may not use the pointer to change the literal that is you may not write for example

char *s = "blabla";
s[0] = 'B';

In C++ string literals indeed have types of constant character arrays. So you have to write in a C++ program

const char *s = "blabla";

In C you may also to write

char a[6] = "blabla";
     ^^^^

In this case the terminating zero of the string literal will not be used to initialize the character array a . So the array will not contain a string.

In C++ such a declaration is invalid.

Upvotes: 4

gabry
gabry

Reputation: 1422

"blabla" is what the book says, an array of characters of 7 bytes, the last being '\0', placed in a read only data space (when possible).

(1) When you write:

 char a[7] = "blabla";

You tell the compiler to create a mutable array of 7 characters on the stack and copy inside it the read only array. Please note that you can write also:

 char a[] = "blabla";

... that is safer because the compiler will count the characters for you.

(2) Given the fact a[] is a copy of "blabla" you can write to it without problems. If you want to keep the read only property you can write:

const char *a = "blabla";

This time a will be a const pointer to the constant string and its contents will be not mutable. You will be able to reassign the pointer anyway:

const char *a = "blabla";
a = "blublu";

Upvotes: 2

Sourav Ghosh
Sourav Ghosh

Reputation: 134356

First case,

char a[7] = "blabla" , is not possible [...]

Yes, it is possible, this is an initialization.

Quoting C11, chapter §6.7.9/P14, Initialization,

An array of character type may be initialized by a character string literal or UTF−8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.

Second case,

it says when you see "blabla" it means like a const char array , and that means I can't change a, at all (which is not true).

[From the point of directly attempting to modify a string literal]

You can, but you MUST not.

From chapter §6.4.5

[...] If the program attempts to modify such an array, the behavior is undefined.

That said, in your case, a is not a pointer to the string literal, it is an array, with elements initialized with the content from the string literal. You are perfectly allowed to modify the contents of a array.

Upvotes: 3

Related Questions