Reputation: 1242

Which one is the correct address of this char array in C?

#include<stdio.h>
int main()
{
  char *str1 = "computer";

  printf ("%p\n", (void *)   str1);   // i
  printf ("%p\n", (void *)  &str1);   // ii
  printf ("%d\n",           *str1);   // iii
  printf ("%p\n", (void *)  *&str1);  // iv
  printf ("%p\n", (void *)  &*str1);  // v

  return 0;
}

I know that & is the 'address of' operator and * is the 'value at address' operator. According to me,

i. This is the address of the char array since the name of the array represents the address of the first element.

ii. I do not understand this as address of an address seems confusing.

iii. This should represent the value at address (that is computer) but the format specifier is %p and a random number gets printed in the output.

iv. This should mean value at address of address. This is again confusing.

I'm getting random values in the output. I understand that some of these could be because of undefined behavior. But if someone could clarify the meaning of these, I will be very grateful.

I also want to say that

printf ("%p\n",  **str1);
printf ("%p\n",  &&str1);

give compilation failed errors.

Upvotes: 4

Answers (5)

torek

Reputation: 489558

The only array in the sample code is the one produced by the string literal:

"computer"

In most contexts,¹ a string literal in C directs the compiler to produce (or act as if it had produced) an array somewhere in memory, with the contents of that array filled with the characters making up the string literal. You are not allowed to write on the array elements—as if the array had type const char [N] for the appropriate integer constant N—but the array itself actually has type char [N]. (The array may be, and on most modern compilers is-if-possible, placed in physically-read-only storage such as ROM or text space, although many compilers have directives that can affect this. As others noted the rules are a bit different for C++. The integer constant N is determined by the number of characters in the string literal, with a terminating '\0' char added.)

In general, given a variable v of type "array N of T" (for some type T), you simply write &v to get its address. This is a value of type "pointer to array N of T". This works for string literals as well:²

char (*p_a)[6] = &"hello";

Here p_a is a pointer to the array. The difference between this and:

char *p_c = "world";

is that p_c is a pointer to a single char rather than a pointer to the (entire) array. Hence p_c[1] is the second char in that array, the 'o'. But p_a[1] would be the entire 6-char-long array that follows the first array "hello", if there were such an array (there is not one, and hence attempting to access p_a[1] produces undefined behavior).

Literally speaking, then, the answer to the question as asked is "none of them". None of these expressions point to the entire array all at once!

Since we know that "computer" consists of 9 chars—the eight visible chars and the one terminating '\0'—we can take the value stored in str1, which is a pointer to the first of those 9 chars, and convert it (via casts and/or assignments) into the correct type, char (*)[9], in order to get "the address of the (entire) array":

char (*answer_1)[9] = (void *)str1;

or:

((char (*)[9]) str1); /* answer2: produce the value, then ignore it */

for instance.

The thing about pointers and arrays in C, though, is that a pointer to the first element of an array is, for all practical purposes, "just as good" as a pointer to the entire array. Hence just plain str1, which is a pointer to the first element of the 9-char-long array, is "just as good" as a pointer to the entire 9-char-long array.

This all sounds bizarre to people new to C. Ages ago, I made up a graphic that displays the key difference; it can be found here. The essence of the difference is the "size of the circle" rather than the "address value" itself.

Item number 2 (or I should say "ii") in your list, &str1, uses the fact that you have an actual object, str1, of type "pointer to char". The address-of operator & gets you a pointer to that object: a value of type "pointer to (pointer to char)", or char **. The value itself can be thought of as an arrow pointing to the variable str1, while the value stored in str1 itself can be thought of as an arrow pointing to the c in "computer".

¹The main exception here is when the string literal is used as an initializer. For instance:

char ch_array[9] = "computer";

exactly fills the 9-char-long array ch_array, without necessarily creating a second array anywhere. In these cases you can control the actual const-ness of the resulting array; ch_array is not const and is something you can write to, so you can change the letters in this array, making it read "catputer", for instance, if you wish.

The C standards also allow you to make the array one element "too short", i.e., to fail to hold the terminating '\0'. In this case the terminator is simply not stored anywhere. There is no standard way to do this without creating such an object, although there was a proposal, which went nowhere, to use a final \z escape inside a string literal to get the same result.

Note that string concatenation:

char *x = "com" "put" "er";

does not insert the terminating '\0' between parts: the terminator is added after concatenation. And, since writing on the anonymous array generated here produces undefined behavior, a compiler can (and good ones do) share string literals as well, possibly with "tail sharing",³ so that:

char *s1 = "hello world", *s2 = "world";

might generate only one array, holding the "hello world" string, and make s2 point to the w within that string.

²Back in the 1980s, a lot of compilers got this wrong. To be fair, there was no standard for C at the time: the original ANSI C standard came out in December of 1989, so it was not until 1990 that there were any official rules one could apply to compiler-writers.

³"Tail sharing" is not a term of art, just a phrase I've made up here to attempt to describe the process.

Upvotes: 4

Some programmer dude

Reputation: 409422

Lets take these in order:

str returns a pointer to the first character in the array
&str returns a pointer to the pointer (i.e. it points to where the variable str is located)
*str dereferences the pointer, and returns the value pointed to, in other words it returns the first character in the string (so it's no pointer at all)
*&str returns the same as 1, since the dereference and address-of operators cancel each other out
&*str is the same as above, the address-of and dereference cancel each other out

The difference between points 4 and 5 is the order in which the expression is evaluated, but the result is the same.

In your case with a char pointer, the "correct" way is 1.

As for your errors for **str and &&str it because *str results in a char, and that's not a type you can dereference. The one with double ampersand is because it's the logical and operator && applied in an unary expression instead of a binary expression.

An unrelated point though: String literals are constant arrays of characters, with the size equal to the number of characters in the string plus one for the string terminator. That means that having a plain non-const pointer is wrong, the declaration should be either

const char *str = "computer";

or possible (don't know if it works in C)

char const *str = "computer";

Upvotes: 2

Deduplicator

Reputation: 45684

All your examples cause undefined behavior, because printf with format specifier %d expects a signed int, not a pointer.

As stdcall points out, the proper format specifier for data pointers (though only for pointers to characters and void-pointers, the rest must, strictly speaking, be cast to either of them) is %p, which will print the pointer in an implementation-defined format.

This is an rvalue of type char**: &str1 Cast to void* and print with %p
These are lvalues of type char*: str1 *&str1 Print the string with %s and the pointer value with %p
This is an rvalue of type char*: &*str1 Same as above
This is an lvalue of type char: *str1 Print with %c

All char* point to the same object.

Be aware that even though string literals have type char[] for historic reasons, they are constants which might be shared: Trying to change them results in UB.

Upvotes: 1

Holt

Reputation: 37641

printf ("%d", str1);   // i

This is the address of your "string", where you char array is stored (equivalent to &str[0]). The type of str1 is obvisouly char* which does not match the %d specifier, but most of implementation will output the address as an integer.

printf ("\n%d", &str1);   // ii

str1 points to the address of the first char in str1 (i.e. &str[0]), so &str1 points to str1:

&str1 -> str1 -> "computer"

printf ("\n%d", *str1);   // iii

This is simply equivalent to str1[0] which is 'c'. Since you specify %d specifier, the value output is the ansi value of 'c'.

printf ("\n%d", *&str1);  // iv
printf ("\n%d", &*str1);  // v

Well, *&a and &*a are equivalent in most (all?) cases to a, so here you just get the first behaviour.

printf ("\n%d", **str1);

str1 is a pointer, so you can dereference it to get 'c', but *str1 is 'c' and you can't dereference a char.

printf ("\n%d", &&str1);

&str1 is the address of str1 but it's not a variable so it does not have any address so &&str1 is not possible (the address of the address of str1 does not exist).

As mentionned in the other answers, the correct way to output a pointer is %p. You get random value for all cases (except iii normally) because str1 does not always have the same address.

Upvotes: 0

Luca Davanzo

Reputation: 21528

printf ("\n%d", *str1); // iii

Don't print a random number, but the ascii value of first char

*str1 is, infact, value of first char 'c' == 99

with *(str1 + i) you access element i of str1

Upvotes: 0

Which one is the correct address of this char array in C?

Answers (5)

Related Questions