Reputation: 4534
I was messing around in C, and decided it'd be cool to try changing up the type of argv from char * to int, just to see what would happen. I wrote this:
#include <stdlib.h>
#include <stdio.h>
int main(int argc, int argv )
{
printf("arg is %d \n", argv);
}
I get really weird output from this program. Whenever I run it, with whatever arguments I run it with, it seems to just spit back random numbers at me. Here is the output:
[14:30:00][maksim]~/learnProg/cDance$ ./dink
arg is -2058142376
[14:30:01][maksim]~/learnProg/cDance$ ./dink 2141
arg is 2111473256
[14:30:04][maksim]~/learnProg/cDance$ ./dink 2141
arg is -8005928
(the program is called dink). Whats going on? What does C do when it compiles this? What would happen if I used data types other than int, like a double or a structure or whatever?
Upvotes: 2
Views: 539
Reputation: 702
I don't disagree with most of the other answers and, as jamesdlin noted, C99 specifies the behavior as undefined if main
is not declared properly. I think then your question becomes about what is this so-called undefined behavior. I say "so-called undefined behavior" because it is in fact defined very precisely as part of a platform/system Application Binary Interface (ABI). While the ABI may not specifically address the situation you pose with passing a pointer as an int
but it does define how arguments are passed and so a little research will reveal exactly what happens in your particular scenario.
Since the ABI answers all of the questions about "what happens if I pass this as an int, double, or structure", your next question might be "what is the ABI for my system". The ABI is system/platform specific, it could be different between Windows and Linux, between PowerPC and X86, between different compilers, and even between different versions of a compiler. You didn't provide the necessary platform/system information to answer the "which ABI" question but, even if you had provided it, I have no intentions of answering it since research would be required on my part (I'm no expert). Besides, this is your experiment so it will be a good learning experience for you to research and understand the ABI of your system.
There is a lot of good information out there, including a question asking what is the ABI, a brief overview of the Linux ABI and, of course, the wikipedia page. The ABI question provides a link to the System V ABI PDF and that very possibly covers your system ABI so might be the best place to start.
To summarize, your experiment results in undefined behavior according to C99 but the actual behavior is defined by the system ABI but the system ABI is, well, system-specific. In other words, C99 does not specify the behavior in your experiment because it is system-specific behavior that is outside of C99. The system-specific ABI, on the other hand, does define the behavior as part of the definition of how arguments are passed. By understanding your system's ABI you will be able to understand (i.e. define) the behavior you are seeing. Most likely this definition will be somewhat unimpressive, for example, the int
argument and pointer argument are not compatible so what you receive as the int
argument truly is random garbage that happens to reside in a certain register or memory location. Or it could be the upper or lower 32bits of a 64bit pointer.
Upvotes: 0
Reputation: 488183
As others noted, the behavior is undefined (so anything might happen).
Let's look at three "typical" behaviors though. Three common ways to pass arguments are:
Intel x86 systems mostly use the first method (but sometimes the second or third). MIPS-based processors mostly use the second.
If a system uses one or more stacks, the usual calling method is:
main
), push arguments, typically right to left, i.e., in reverse order. Stack pushes usually (but not always) look like *--sp = value;
in C, with the stack pointer(s) descending from some high address.main
)sp[0]
, sp[1]
, etc. If the calling mechanism uses the same stack as the parameter-passing mechanism, the indexes may start at 1 or 2 or even more (sp[2]
being the first argument, for instance, and sp[3]
being the second).In this case, argc
will probably come out correct but argv
will mis-interpret whatever the caller pushed, producing a strange-looking int
. If the underlying system is sufficiently fancy (checking types), it might detect that the caller pushed a value of type char **
but you're accessing one of type int
, and give you some kind of run-time error. Most systems simple prefer to give you the wrong answer as fast as possible, though, skipping the type-checking. So you'll get a strange-looking int
, but it will actually be based (at least in part—see below) off the actual pointer value the caller tried to pass.
If the system uses general purpose registers (instead of, or prior to, using a stack—systems using GPRs often fall back on stacks if you use many parameters, and sometimes use them for all variadic functions, i.e., those using the <stdarg.h>
facilities), then the calling method looks more like this:
int argc
value and char **argv
value) into the first two argument registers (e.g., %o0
and %o1
on SPARC, or $a0
and $a1
on MIPS). In this case, the code generally behaves the same as on the stack-based system. It just runs faster, since arguments-in-registers tend to need fewer CPU cycles than arguments-in-memory. (This is why some Intel compilers will sometimes pass an argument or two in registers.)
If the system uses special purpose registers, though, we get a new apparent behavior. Let's say that floating point values go in f
registers (true on some SPARC systems; x86 has the MMX and SSE registers instead); pointer values go in a
registers (a la 680x0 CPUs); and integer values go in d
registers (680x0, again—although in practice most 680x0 systems just use "the stack", but let's assume we have one that uses registers). This time, the thing calling main
needs to pass one integer, argc
, and one pointer, argv
, so it does this:
argc
into data register d0
argv
into pointer register a0
main
Now, in main()
, you told the compiler to expect two integer arguments, which would arrive in registers d0
and d1
respectively. What's in CPU register d1
? Who knows, the thing that called main
did not set it just before the call. It has whatever value it has, from whoever last stuck some value in it. The value is no longer associated with the intended argv
, since that's in register a0
.
Now, even if you have a stack or GPR-based calling system, there's another few wrinkles to consider:
int
s are only 32 bits? In this case, the caller pushes a 64-bit value, or writes a 64-bit value into the parameter-register; but main
looks only at 32 bits. You'll see half of what was actually given.int
s are 64 bits? That's an unusual implementation, to be sure, but now you'll be looking at all 64 bits of a value that only supplied 32. The "extra" 32 bits might be all zero (this would be typical for parameters in GPRs), or might be 32 bits of some unrelated value, similar to the case of inspecting register d1
when main
's caller filled in register a0
.There's one other noteworthy possibility. If you build similar C++ code (with a function other than main
), it generally fails to link. The reason is that C++ compilers often use a technique called "name mangling" to handle overloaded functions. A function named f
that takes one int
and one char **
argument and returns int
produces the link-time symbol Z1fiPPC
. A function named f
that takes two int
s and returns int
produces the link-time symbol Z1fii
instead. I haven't seen C compilers that do this, but they could do it. In this case, the compiler would check, at link time, whether your program defined Z4mainippC
—int main(int, char **)
—and if so, link in the caller that provides those arguments; or it would check for Z4mainv
—int main(void)
—and in that case link in the caller that provides no arguments. If neither function is found, the linker could detect that you wrote an incorrect main
and not produce an executable at all!
Upvotes: 4
Reputation: 1484
Let's first understand what exactly argv
is.
Consider the standard main()
format. It is int main(int argc, char *argv[])
Here argv
is an array of character pointers. Since name of an array is a constant pointer to it's first member, we will say argv
is a pointer to it's first member. i.e. argv
is pointer to character pointer.
Now please note name doesn't matter here. It can be anything beside argv
. What matters is second argument to main()
is a pointer to character pointer. i.e. The second argument is pointer to pointer to character.
So when program starts execution, a memory address is passed as a second argument to main()
which is an address of another pointer. And that 'another' pointer is a memory address of very first character of very first argument. And that argument happens to be program's name.
So when you say int main(int argc, int argv )
you are casting an address in int
value. If sizeof(int) == sizeof(int *)
then that's not a problem at all. The value won't be demoted in that case.
Now when you say printf("arg is %d \n", argv);
you are simply printing that address. That's it! No matter what are your arguments given to a program that address is a random value. That's why you are getting the random no.s which are actually the addresses of first member of argv
array. i.e. The no. printed is an address of program name which in turn is an address of it's first char. (Since program name is again an array so is a constant pointer to it's first member. i.e. the very first character)
To verify this add this line to you code snippet:
printf("%c\n", **(char **)argv);
You will see .
being printed which was indeed the very first character of very first argument ./dink
Upvotes: 0
Reputation: 887453
argv
is passed to your program as a pointer to an array of pointers to strings.
If lie and you tell the compiler that it's an int
, the bytes of the pointer will be interpreted as an int
, and you'll get a memory address. (on a 64-bit system, you'll probably get a crash)
If you pretend that it's a float
, the compiler will probably interpret those bytes / bits as an IEE-754-encoded floating-point value, resulting in a differently weird number. (what exactly happens depends on the calling convention)
If you pretend that it's any type which is not the same width as a pointer, you will probably crash.
C does exactly what you tell it to. It is up to you to tell it how to interpret things.
Upvotes: 2
Reputation: 89975
You'll get undefined behavior, which means it's legitimate for anything to happen. main
must be declared as:
int main(void)
or as:
int main(int arg, char** argv)
or as some form specified by your implementation.
From section J.2 of the ISO C99 standard:
The behavior is undefined in the following circumstances:
...
- A program in a hosted environment does not define a function named
main
using one of the specified forms (5.1.2.2.1).
Upvotes: 3
Reputation: 1847
The C main() functions receives an integer for the argument count and a pointer to an array of char.
Your output is simply the memory address which this pointer contains. If you cast it to other variable types, they will also contain "rubbish".
Under normal circumstances, it should be avoided to cast pointers, if possible.
Upvotes: 0
Reputation: 1788
Well...
Argv is an array. In C, arrays are simply pointers. Pointers are internally just integers for memory locations. So, the numbers you saw are locations in memory. (I'm guessing the negatives are because it's not unsigned)
Upvotes: 0