user3502374
user3502374

Reputation: 779

difficulty in understanding c pointers when it is on it's own (Part 2)

I asked this question when I saw actual multiplication and everyone helped clear understanding of it.(so I much appreciate it).

However, this continues to bother me in different situation.

For an example,

int main(int argc, const char * argv[] )

Why isn't above written like below?

int main(int argc, const char *argv[] ) 

Is there differences? Sometimes I see some funky location of * and I just don't understand why they sometimes seem to float around wherever they like(or whoever knew what they were doing I guess).

Upvotes: 2

Views: 76

Answers (2)

John Bode
John Bode

Reputation: 123558

In this case, whitespace doesn't matter. You could write it as

char *argv[]

or

char* argv[ ]

or

char      *        argv     [              ]

or even

char*argv[]

All four forms are parsed as char (*argv[]); the * operator is part of the declarator (as is the [] operator).

During compilation, your source code is broken up into tokens; informally speaking, a token is the smallest meaningful part of a program (keywords, operators, identifiers, literals, punctuators, etc.). Tokens are grouped into expressions and statements according to the language grammar.

C's tokenizing algorithm is "greedy"; it will try to form the longest possible tokens from the source text, so sometimes you need to separate tokens with whitespace. For example, if the compiler sees the text "charfoo;", it will tokenize it into charfoo and ; (the ; is a separate token, since the ';' character isn't part of an identifier). It doesn't recognize char as a separate token, since it's not looking for specific keywords at this stage. In order for it to be recognized as a declaration, you need at least one whitespace character between char and foo: "char foo;". Whitespace, like ';', isn't part of an identifier (or any other token), so it separates the char and foo tokens from each other. Since whitespace has no meaning beyond separating tokens at this stage, it doesn't matter whether you use one space or ten or a hundred; "char foo ;" works just as well.

Throw the '*' character into the mix, though, and you don't need the whitespace: "char*foo;" will be tokenized into char, *, foo, ;, because the '*' character, like a whitespace and ';' characters, isn't part of an identifier. It's a distinct token all its own. Adding whitespace between those tokens doesn't hurt, but as far as the compiler is concerned, it doesn't change anything.

Thus, char * argv[] is tokenized as char, *, argv, [, and ]. Because you have * separating char and argv, it doesn't matter how much or how little whitespace comes between them.

At this point, the compiler uses the language grammar to determine how the sequence of tokens should be interpreted. char is a type specifier, so the compiler assumes that this is the beginning of a declaration. Since this is a declaration, the * operator is interpreted as the unary indirection operator, not the binary multiplication operator.

At this point the precedence of operators comes into play. Postfix operators like [] have higher precedence than unary operators, so *a[N] is parsed as *(a[N]) (a is an N-element array of pointers). Thus, argv is an array of pointers to char.

Unfortunately, there's some additional weirdness at this point; The way C handles arrays is ... different ... from other languages, and this particular declaration actually declares argv as a pointer to a pointer to char.

That, however, is a question for another day.

Upvotes: 1

zwol
zwol

Reputation: 140758

The two lines of code you have shown have exactly the same meaning. C in general ignores horizontal whitespace; the only times it makes a difference are (1) inside string literals, (2) when it changes the boundaries of tokens (consider ++x versus + + x), (3) when it controls whether you are #define-ing a function-like or object-like macro.

In the context of a type declaration, * is not the binary multiplication operator, it is the unary pointer-to modifier, and it always affects the thing to its right. For clarity, one should always write it with a space on the left and no space on the right, but the compiler does not care.

(The common-particularly-in-C++ style of cuddling the star to the left is wrong, because it's misleading: char* a, b does not declare both a and b as pointers. char *a, b by contrast makes clear that the star only affects a. C++ grognards will say that you shouldn't write multiple variable declarations on one line in the first place, which is a reasonable position in itself, but not a valid excuse for misleading human readers about the direction in which * binds. However, this is all strictly a matter of style, again, the compiler cares not.)

Upvotes: 7

Related Questions