Reputation: 117
I was wondering what are all command line arguments to main function (in C specifically, but I am guessing this would apply to all languages regardless)? In my compilers class I heard an instructor briefly mention (possibly I misheard or misunderstood that) that there is more to main() arguments than typically mentioned, specifically that at negative offset from argv pointer one can access some information. I could not find anything by Googling or in a couple of textbooks I have. I wrote this small program in C to try. Here are some questions:
1) While loop runs 32 time before seg faulting. Why are there 32 parameters in total, where can I find specification on them and why are there 32 of them not another quantity?
Information that's printed out is all about the system: pwd, term session info, user info and all that.
2) Is there anything that is put onto the stack before main? In a typical call procedure arguments to the function are put on the stack before the return address (give or take canaries and other stuff). When a program is called by the shell is the process the same and where can I read about this? I'd really like to know how shell calls a program and what's the memory layout compared to in-program stack layout.
#include <stdio.h>
#include <ctype.h>
int main(int argc, char * argv[]) {
void * argall = argv[0];
printf("argc=%d\n", argc);
int i = 0;
while (i < 32) {
//while (argall) { // tried this to find out that it seg faults at i=32
printf("arg%d %s\n", i, (char* ) argall);
i++;
argall = argv[i];
}
printf("negative pointers\n");
// I don't think dereferencing in this part is quite right, but I am
// getting chars since I am reading bytes. Output of below code is.
// How come it is alphabet?
// I tried reading int values and (char*) for string, but got nothing useful.
/*
arg -1 o
arg -2 n
arg -3 m
arg -4 l
arg -5 k
*/
printf("arg -1 %c\n", (char) argv-1);
printf("arg -2 %c\n", (char) argv-2);
printf("arg -3 %c\n", (char) argv-3);
printf("arg -4 %c\n", (char) argv-4);
printf("arg -5 %c\n", (char) argv-5);
return 0;
}
Thanks a lot! Sorry about a long post.
Update:here is the output that comes from the while loop:
argc=1
arg0 ./main-testing.o
arg1 (null)
arg2 TERM_PROGRAM=iTerm.app
arg3 SHELL=/bin/bash
arg4 TERM=xterm-256color
arg5 CLICOLOR=1
arg6 TMPDIR=/var/folders/d0/<redacted>
arg7 Apple_PubSub_Socket_Render=/private/<redacted>
arg8 OLDPWD=/Users/me/problems
arg9 USER=me
arg10 COMMAND_MODE=unix2003
arg11 SSH_AUTH_SOCK=/private/t<redacted>
arg12 _<redacted>
arg13 LSCOLORS=ExFxBxDxCxegedabagacad
arg14 PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin
arg15 PWD=/Users/me/problems/c
arg16 LANG=en_CA.UTF-8
arg17 ITERM_PROFILE=Default
arg18 XPC_FLAGS=0x0
arg19 PS1=\[\033[36m\]\u\[\033[m\]@\[\033[32m\]\h:\[\033[33;1m\]\w\[\033[m\]$
arg20 XPC_SERVICE_NAME=0
arg21 SHLVL=1
arg22 COLORFGBG=7;0
arg23 HOME=/Users/me
arg24 ITERM_SESSION_ID=w0t0p0
arg25 LOGNAME=me
arg26 _=./main-testing.o
arg27 (null)
arg28 executable_path=./main-testing.o
arg29
arg30
arg31
Upvotes: 2
Views: 351
Reputation: 754880
You seem to be using a Mac. On a Mac, you get 4 bits of data.
You can use the alternative declaration for main()
of:
int main(int argcv, char **argv, char **envp)
and you will then be able to list the environment, as you did by accessing beyond the end of the argument list. The environment follows the arguments, and is also terminated by a null pointer.
Then a Mac has some more data after the environment (you can see executable_path=…
in your output). You can find some information about that at Wikipedia under Entry Point, which refers to The char *apple[]
Argument Vector:
int main(int argc, char **argv, char **envp, char **applev)
I'm not aware of any standardization for what goes before the argv
vector. Accessing them as single characters is unlikely to be useful. I'd print the data as addresses and look for patterns.
This is some code I wrote a few years ago for trying to find the argument list from environ
; it works up until you modify the environment by adding a new variable, which changes where environ
points:
#include <inttypes.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h> /* putenv(), setenv() */
extern char **environ; /* Should be declared in <unistd.h> */
/*
** The object of the exercise is: given just environ (since that is all
** that is available to a library function) attempt to find argv[0] (and
** hence argc).
**
** On some platforms, the layout of memory is such that the number of
** arguments (argc) is available, followed by the argument vector,
** followed by the environment vector.
**
** argv environ
** | |
** v v
** | argc | argv0 | argv1 | ... | argvN | 0 | env0 | env1 | ... | envN | 0 |
**
** This applies to:
** -- Solaris 10 (32-bit, 64-bit SPARC)
** -- MacOS X 10.6 (Snow Leopard, 32-bit and 64-bit)
** -- Linux (RHEL 5 on x86/64, 32-bit and 64-bit)
**
** Sadly, this is not quite what happens on the other two Unix
** platforms. The value preceding argv0 seems to be a 0.
** -- AIX 6.1 (32-bit, 64-bit)
** -- HP-UX 11.23 IA64 (32-bit, 64-bit)
** Sub-standard POSIX support (no setenv()) and C99 support (no %zd).
**
** NB: If putenv() or setenv() is called to add an environment variable,
** then the base address of environ changes radically, moving off the
** stack onto heap, and all bets are off. Modifying an existing
** variable is not a problem.
**
** Spotting the change from stack to heap is done by observing whether
** the address pointed to by environ is more than 128 K times the size
** of a pointer from the address of a local variable.
**
** This code is nominally incredibly machine-specific - but actually
** works remarkably portably.
*/
typedef struct Arguments
{
char **argv;
size_t argc;
} Arguments;
static void print_cpp(const char *tag, int i, char **ptr)
{
uintptr_t p = (uintptr_t)ptr;
printf("%s[%d] = 0x%" PRIXPTR " (0x%" PRIXPTR ") (%s)\n",
tag, i, p, (uintptr_t)(*ptr), (*ptr == 0 ? "<null>" : *ptr));
}
enum { MAX_DELTA = sizeof(void *) * 128 * 1024 };
static Arguments find_argv0(void)
{
static char *dummy[] = { "<unknown>", 0 };
Arguments args;
uintptr_t i;
char **base = environ - 1;
uintptr_t delta = ((uintptr_t)&base > (uintptr_t)environ) ? (uintptr_t)&base - (uintptr_t)environ : (uintptr_t)environ - (uintptr_t)&base;
if (delta < MAX_DELTA)
{
for (i = 2; (uintptr_t)(*(environ - i) + 2) != i && (uintptr_t)(*(environ - i)) != 0; i++)
print_cpp("test", i, environ-i);
args.argc = i - 2;
args.argv = environ - i + 1;
}
else
{
args.argc = 1;
args.argv = dummy;
}
printf("argc = %zd\n", args.argc);
for (i = 0; i <= args.argc; i++)
print_cpp("argv", i, &args.argv[i]);
return args;
}
static void print_arguments(void)
{
Arguments args = find_argv0();
printf("Command name and arguments\n");
printf("argc = %zd\n", args.argc);
for (size_t i = 0; i <= args.argc; i++)
printf("argv[%zd] = %s\n", i, (args.argv[i] ? args.argv[i] : "<null>"));
}
static int check_environ(int argc, char **argv)
{
size_t n = argc;
size_t i;
unsigned long delta = (argv > environ) ? argv - environ : environ - argv;
printf("environ = 0x%lX; argv = 0x%lX (delta: 0x%lX)\n", (unsigned long)environ, (unsigned long)argv, delta);
for (i = 0; i <= n; i++)
print_cpp("chkv", i, &argv[i]);
if (delta > (unsigned long)argc + 1)
return 0;
for (i = 1; i < n + 2; i++)
{
printf("chkr[%zd] = 0x%lX (0x%lX) (%s)\n", i, (unsigned long)(environ - i), (unsigned long)(*(environ - i)),
(*(environ-i) ? *(environ-i) : "<null>"));
fflush(0);
}
i = n + 2;
printf("chkF[%zd] = 0x%lX (0x%lX)\n", i, (unsigned long)(environ - i), (unsigned long)(*(environ - i)));
i = n + 3;
printf("chkF[%zd] = 0x%lX (0x%lX)\n", i, (unsigned long)(environ - i), (unsigned long)(*(environ - i)));
return 1;
}
int main(int argc, char **argv)
{
printf("Before setting environment\n");
if (check_environ(argc, argv))
print_arguments();
//putenv("TZ=US/Pacific");
setenv("SHELL", "/bin/csh", 1);
printf("After modifying environment\n");
if (check_environ(argc, argv) == 0)
printf("Modifying environment messed everything up\n");
print_arguments();
putenv("CODSWALLOP=nonsense");
printf("After adding to environment\n");
if (check_environ(argc, argv) == 0)
printf("Adding environment messed everything up\n");
print_arguments();
return 0;
}
Upvotes: 3
Reputation: 241901
On Linux, *BSD -- and hence Mac OS X -- and probably other unix-like systems, the environ
array is constructed on the stack following the argv
array.
environ
contains all the environment variables as an array of strings each of the form name=value
. While individual environment variables are generally accessed through the getenv
function, use of the environ
global variable is also permitted (by Posix).
Looking for these strings on the stack underneath the main
call frame is not correct, nor does it offer any advantage over the use of environ
.
If you want to look at the actual code, you'll need to dive into the implementation of the execve
system call, which is what actually initiates a new process. There's what looks like a reasonably accurate discussion of the Linux process startup here on lwn.org, which includes pointers to code repositories. The FreeBSD implementation, which is in many respects similar, is found in /sys/kern/kern_exec.c
; you might start reading here.
Upvotes: 1