MYV
MYV

Reputation: 4534

what does C do if argv is a type other than char **

I was messing around in C, and decided it'd be cool to try changing up the type of argv from char * to int, just to see what would happen. I wrote this:

#include <stdlib.h>
#include <stdio.h>
int main(int argc, int  argv )
{
        printf("arg is %d \n", argv);
}

I get really weird output from this program. Whenever I run it, with whatever arguments I run it with, it seems to just spit back random numbers at me. Here is the output:

[14:30:00][maksim]~/learnProg/cDance$ ./dink
arg is -2058142376 
[14:30:01][maksim]~/learnProg/cDance$ ./dink 2141
arg is 2111473256 
[14:30:04][maksim]~/learnProg/cDance$ ./dink 2141
arg is -8005928 

(the program is called dink). Whats going on? What does C do when it compiles this? What would happen if I used data types other than int, like a double or a structure or whatever?

Upvotes: 2

Views: 539

Answers (7)

slowjelj
slowjelj

Reputation: 702

I don't disagree with most of the other answers and, as jamesdlin noted, C99 specifies the behavior as undefined if main is not declared properly. I think then your question becomes about what is this so-called undefined behavior. I say "so-called undefined behavior" because it is in fact defined very precisely as part of a platform/system Application Binary Interface (ABI). While the ABI may not specifically address the situation you pose with passing a pointer as an int but it does define how arguments are passed and so a little research will reveal exactly what happens in your particular scenario.

Since the ABI answers all of the questions about "what happens if I pass this as an int, double, or structure", your next question might be "what is the ABI for my system". The ABI is system/platform specific, it could be different between Windows and Linux, between PowerPC and X86, between different compilers, and even between different versions of a compiler. You didn't provide the necessary platform/system information to answer the "which ABI" question but, even if you had provided it, I have no intentions of answering it since research would be required on my part (I'm no expert). Besides, this is your experiment so it will be a good learning experience for you to research and understand the ABI of your system.

There is a lot of good information out there, including a question asking what is the ABI, a brief overview of the Linux ABI and, of course, the wikipedia page. The ABI question provides a link to the System V ABI PDF and that very possibly covers your system ABI so might be the best place to start.

To summarize, your experiment results in undefined behavior according to C99 but the actual behavior is defined by the system ABI but the system ABI is, well, system-specific. In other words, C99 does not specify the behavior in your experiment because it is system-specific behavior that is outside of C99. The system-specific ABI, on the other hand, does define the behavior as part of the definition of how arguments are passed. By understanding your system's ABI you will be able to understand (i.e. define) the behavior you are seeing. Most likely this definition will be somewhat unimpressive, for example, the int argument and pointer argument are not compatible so what you receive as the int argument truly is random garbage that happens to reside in a certain register or memory location. Or it could be the upper or lower 32bits of a 64bit pointer.

Upvotes: 0

torek
torek

Reputation: 488183

As others noted, the behavior is undefined (so anything might happen).

Let's look at three "typical" behaviors though. Three common ways to pass arguments are:

  • on a stack
  • in general purpose registers
  • in special purpose registers

Intel x86 systems mostly use the first method (but sometimes the second or third). MIPS-based processors mostly use the second.

If a system uses one or more stacks, the usual calling method is:

  • in the caller (some OS-supplied routine that calls main), push arguments, typically right to left, i.e., in reverse order. Stack pushes usually (but not always) look like *--sp = value; in C, with the stack pointer(s) descending from some high address.
  • make the call into the target function (main)
  • in the target function, retrieve parameters off "the stack" or "the parameter stack" or "the current thread stack" or whatever the system uses. Because they were pushed in reverse order, they are at addresses like sp[0], sp[1], etc. If the calling mechanism uses the same stack as the parameter-passing mechanism, the indexes may start at 1 or 2 or even more (sp[2] being the first argument, for instance, and sp[3] being the second).

In this case, argc will probably come out correct but argv will mis-interpret whatever the caller pushed, producing a strange-looking int. If the underlying system is sufficiently fancy (checking types), it might detect that the caller pushed a value of type char ** but you're accessing one of type int, and give you some kind of run-time error. Most systems simple prefer to give you the wrong answer as fast as possible, though, skipping the type-checking. So you'll get a strange-looking int, but it will actually be based (at least in part—see below) off the actual pointer value the caller tried to pass.

If the system uses general purpose registers (instead of, or prior to, using a stack—systems using GPRs often fall back on stacks if you use many parameters, and sometimes use them for all variadic functions, i.e., those using the <stdarg.h> facilities), then the calling method looks more like this:

  • in the caller, move arguments (int argc value and char **argv value) into the first two argument registers (e.g., %o0 and %o1 on SPARC, or $a0 and $a1 on MIPS).
  • make the call to the target function
  • in the target function, access the values from the argument registers

In this case, the code generally behaves the same as on the stack-based system. It just runs faster, since arguments-in-registers tend to need fewer CPU cycles than arguments-in-memory. (This is why some Intel compilers will sometimes pass an argument or two in registers.)

If the system uses special purpose registers, though, we get a new apparent behavior. Let's say that floating point values go in f registers (true on some SPARC systems; x86 has the MMX and SSE registers instead); pointer values go in a registers (a la 680x0 CPUs); and integer values go in d registers (680x0, again—although in practice most 680x0 systems just use "the stack", but let's assume we have one that uses registers). This time, the thing calling main needs to pass one integer, argc, and one pointer, argv, so it does this:

  • move integer argument argc into data register d0
  • move pointer argument argv into pointer register a0
  • call main

Now, in main(), you told the compiler to expect two integer arguments, which would arrive in registers d0 and d1 respectively. What's in CPU register d1? Who knows, the thing that called main did not set it just before the call. It has whatever value it has, from whoever last stuck some value in it. The value is no longer associated with the intended argv, since that's in register a0.

Now, even if you have a stack or GPR-based calling system, there's another few wrinkles to consider:

  • What if pointers are 64 bits and plain ints are only 32 bits? In this case, the caller pushes a 64-bit value, or writes a 64-bit value into the parameter-register; but main looks only at 32 bits. You'll see half of what was actually given.
  • What if pointers are 32 bits and plain ints are 64 bits? That's an unusual implementation, to be sure, but now you'll be looking at all 64 bits of a value that only supplied 32. The "extra" 32 bits might be all zero (this would be typical for parameters in GPRs), or might be 32 bits of some unrelated value, similar to the case of inspecting register d1 when main's caller filled in register a0.
  • And of course, there's nothing that says 32 and 64 bits are the only possible sizes. On IBM AS/400 systems, pointers are a whopping 128 bits long (16 byte tagged pointers), and there is extensive run-time type-checking. These machines work on making sure the code is correct, not merely fast.

There's one other noteworthy possibility. If you build similar C++ code (with a function other than main), it generally fails to link. The reason is that C++ compilers often use a technique called "name mangling" to handle overloaded functions. A function named f that takes one int and one char ** argument and returns int produces the link-time symbol Z1fiPPC. A function named f that takes two ints and returns int produces the link-time symbol Z1fii instead. I haven't seen C compilers that do this, but they could do it. In this case, the compiler would check, at link time, whether your program defined Z4mainippCint main(int, char **)—and if so, link in the caller that provides those arguments; or it would check for Z4mainvint main(void)—and in that case link in the caller that provides no arguments. If neither function is found, the linker could detect that you wrote an incorrect main and not produce an executable at all!

Upvotes: 4

rootkea
rootkea

Reputation: 1484

Let's first understand what exactly argv is.

Consider the standard main() format. It is int main(int argc, char *argv[]) Here argv is an array of character pointers. Since name of an array is a constant pointer to it's first member, we will say argv is a pointer to it's first member. i.e. argv is pointer to character pointer.

Now please note name doesn't matter here. It can be anything beside argv. What matters is second argument to main() is a pointer to character pointer. i.e. The second argument is pointer to pointer to character.

So when program starts execution, a memory address is passed as a second argument to main() which is an address of another pointer. And that 'another' pointer is a memory address of very first character of very first argument. And that argument happens to be program's name.

So when you say int main(int argc, int argv ) you are casting an address in int value. If sizeof(int) == sizeof(int *) then that's not a problem at all. The value won't be demoted in that case.

Now when you say printf("arg is %d \n", argv); you are simply printing that address. That's it! No matter what are your arguments given to a program that address is a random value. That's why you are getting the random no.s which are actually the addresses of first member of argv array. i.e. The no. printed is an address of program name which in turn is an address of it's first char. (Since program name is again an array so is a constant pointer to it's first member. i.e. the very first character)

To verify this add this line to you code snippet:

printf("%c\n", **(char **)argv);  

You will see . being printed which was indeed the very first character of very first argument ./dink

Upvotes: 0

SLaks
SLaks

Reputation: 887453

argv is passed to your program as a pointer to an array of pointers to strings.

If lie and you tell the compiler that it's an int, the bytes of the pointer will be interpreted as an int, and you'll get a memory address. (on a 64-bit system, you'll probably get a crash)

If you pretend that it's a float, the compiler will probably interpret those bytes / bits as an IEE-754-encoded floating-point value, resulting in a differently weird number. (what exactly happens depends on the calling convention)

If you pretend that it's any type which is not the same width as a pointer, you will probably crash.

The moral of the story is

C does exactly what you tell it to. It is up to you to tell it how to interpret things.

Upvotes: 2

jamesdlin
jamesdlin

Reputation: 89975

You'll get undefined behavior, which means it's legitimate for anything to happen. main must be declared as:

int main(void)

or as:

int main(int arg, char** argv)

or as some form specified by your implementation.

From section J.2 of the ISO C99 standard:

The behavior is undefined in the following circumstances:

...

  • A program in a hosted environment does not define a function named main using one of the specified forms (5.1.2.2.1).

Upvotes: 3

user2328447
user2328447

Reputation: 1847

The C main() functions receives an integer for the argument count and a pointer to an array of char.

Your output is simply the memory address which this pointer contains. If you cast it to other variable types, they will also contain "rubbish".

Under normal circumstances, it should be avoided to cast pointers, if possible.

Upvotes: 0

bobbybee
bobbybee

Reputation: 1788

Well...

Argv is an array. In C, arrays are simply pointers. Pointers are internally just integers for memory locations. So, the numbers you saw are locations in memory. (I'm guessing the negatives are because it's not unsigned)

Upvotes: 0

Related Questions