Reputation: 1093
I wrote a program which should calculate the total size of arguments passed to execve
system call.
I have tested this program with maximum size of arguments, expecting that the "Argument list too long" error will be happen only when a ARG_MAX
limit has exceeded. In my opinion, the maximum total size of command line should be as close as possible to ARG_MAX
limit, that is no additional argument (filename) can be added without exceeding of this limit.
But I see another behavior: the number of "unused" bytes fluctuates in unpredictable manner while environment and program name stays unchanged, only the number of arguments are changing.
The questions:
Program
The counting algorithm is next:
size of argv
+ size of envp
+ size of argc
The argv
is array of pointers to strings (pointer to char
), so loop through this array and add to a result the lengths of strings, keeping in mind that every is ended by NULL byte. Then add their pointers to the result - the size of pointer is 8 byte. Thus: the number of pointers * 8
+ lengths of strings (each with a NULL byte)
Almost the same story with envp
- string lengths with NULL byte and pointers. But the last pointer is signalizing to the end of array by pointing to the NULL byte, so add it to the result 8 bytes + 1 bytes
.
The argc
is simple int
.
#include <stdio.h>
#include <string.h>
#include <unistd.h>
int main(int argc, char *argv[], char *envp[]) {
size_t char_ptr_size = sizeof(char *);
// The arguments array total size calculation
size_t arg_strings_size = 0;
size_t string_len = 0;
for(int i = 0; i < argc; i++) {
// Every string ends up with a nullbyte, so the 1 byte is added
string_len = strlen(argv[i]) + 1;
arg_strings_size += string_len;
// printf("%zu:\t%s\n", string_len, argv[i]);
}
size_t argv_size = arg_strings_size + argc * char_ptr_size;
printf( "arg strings size: %zu\n"
"number of pointers to strings %i\n\n"
"argv size:\t%zu + %i * %zu = %zu\n",
arg_strings_size,
argc,
arg_strings_size,
argc,
char_ptr_size,
argv_size
);
// The enviroment variables array total size calculation
size_t env_size = 0;
for (char **env = envp; *env != 0; env++) {
char *thisEnv = *env;
// Every string ends up with a nullbyte, so the 1 byte is added
env_size += strlen(thisEnv) + 1 + char_ptr_size;
}
// The last element of "envp" is a pointer to the NULL byte, so size of pointer and 1 is added
printf("envp size:\t%zu\n", env_size + char_ptr_size + 1);
size_t overall = argv_size + env_size + sizeof(argc);
printf( "\noverall (argv_size + env_size + sizeof(argc)):\t"
"%zu + %zu + %zu = %zu\n",
argv_size,
env_size,
sizeof(argc),
overall);
// Find ARG_MAX by system call
long arg_max = sysconf(_SC_ARG_MAX);
printf("ARG_MAX: %li\n\n", arg_max);
printf("Number of \"unused bytes\": ARG_MAX - overall = %li\n\n", arg_max - (long) overall);
return 0;
}
Testing
1 byte filenames - 975 bytes unused.
$ ./program $(yes A | head -n 209222) # 209223 will cause "Argument list too long"
arg strings size: 418454
number of pointers to strings 209223
argv size: 418454 + 209223 * 8 = 2092238
envp size: 3944
overall (argv_size + env_size + sizeof(argc)): 2092238 + 3935 + 4 = 2096177
ARG_MAX: 2097152
Number of "unused bytes": ARG_MAX - overall = 975
2 bytes filenames - 3206 bytes unused.
$ ./program $(yes AA | head -n 189999)
arg strings size: 570007
number of pointers to strings 190000
argv size: 570007 + 190000 * 8 = 2090007
envp size: 3944
overall (argv_size + env_size + sizeof(argc)): 2090007 + 3935 + 4 = 2093946
ARG_MAX: 2097152
Number of "unused bytes": ARG_MAX - overall = 3206
3 bytes filenames - 2279 bytes unused.
$ ./program $(yes AAA | head -n 174243)
arg strings size: 696982
number of pointers to strings 174244
argv size: 696982 + 174244 * 8 = 2090934
envp size: 3944
overall (argv_size + env_size + sizeof(argc)): 2090934 + 3935 + 4 = 2094873
ARG_MAX: 2097152
Number of "unused bytes": ARG_MAX - overall = 2279
This question is part of my another question: How calculate the number of files which can be passed as arguments to some command for batch processing?
Upvotes: 4
Views: 1470
Reputation: 33621
TL;DR The issues are caused by ASLR
(address space layout randomization) See the UPDATE section below [after my original answer] for an explanation
As paladin mentioned, this is system specific. For example, for freebsd
, the number is much less.
A few things to note [under linux] ...
ARG_MAX
is defined as 131072
[which is 32 4K pages].
_SC_ARG_MAX
returns 2097152
[which is 2MB]
The claim in bits/param.h
:
The kernel headers define ARG_MAX. The value is wrong, though.
However, as measured, it seems to be right.
From the code in linux/fs/exec.c
it checks against the [hardwired] value of ARG_MAX
. It also checks against _STK_LIM
[which is 8MB] and rlimit(RLIMIT_STACK)
[which defaults to _STK_LIM
]
The best way to get the real limit is to count the size of argv
and envp
, which you do. But, you don't account for the size of the NULL
pointer at the end of each.
I'd do a binary search on the amount of data that gets passed [checking for E2BIG
]:
#define _GNU_SOURCE
#include <linux/limits.h>
long arg_lgx = ARG_MAX;
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
#include <sys/param.h>
#include <sys/wait.h>
#include <sys/resource.h>
int pgm_argc;
char **pgm_argv;
char **pgm_envp;
int opt_s;
char *opt_R;
size_t envlen;
size_t totlen;
long arg_max;
size_t lo;
size_t hi;
int status;
size_t
argvlen(char **argv)
{
size_t totlen = 0;
for (; *argv != NULL; ++argv) {
size_t slen = strlen(*argv);
totlen += slen;
totlen += 1;
totlen += sizeof(char *);
}
totlen += sizeof(char *);
return totlen;
}
size_t
lenall(int argc,char **argv,char **envp)
{
size_t totlen = 0;
size_t avlen = argvlen(argv);
avlen += sizeof(argv);
totlen += avlen;
size_t envlen = argvlen(envp);
envlen += sizeof(envp);
totlen += envlen;
totlen += sizeof(argc);
return totlen;
}
char *
strmake(size_t explen)
{
char *bp;
char *buf;
explen -= sizeof(char *);
explen -= 1;
buf = malloc(explen + 1);
for (bp = buf; explen > 0; --explen, ++bp)
*bp = (explen % 26) + 'A';
*bp = 0;
return buf;
}
void
doexec(size_t totlen)
{
size_t explen;
int sverr;
char *argv[4];
explen = totlen;
explen -= envlen;
argv[0] = pgm_argv[0];
argv[1] = "-s";
argv[2] = strmake(explen);
argv[3] = NULL;
pid_t pid = fork();
do {
if (pid == 0) {
printf("%zu %zu %zu\n",lo,totlen,hi);
execvpe(argv[0],argv,pgm_envp);
sverr = errno;
status = sverr << 8;
printf("%8.8X %d -- %s\n",status,sverr,strerror(sverr));
exit(sverr);
break;
}
waitpid(pid,&status,0);
free(argv[2]);
} while (0);
}
int
main(int argc,char **argv,char **envp)
{
char *cp;
size_t totlen;
pgm_argc = argc;
pgm_argv = argv;
pgm_envp = envp;
setlinebuf(stdout);
envlen = argvlen(envp);
arg_max = sysconf(_SC_ARG_MAX);
#if 0
totlen = lenall(argc,argv,envp);
printf("%zu\n",totlen);
#endif
--argc;
++argv;
//printf("main: '%s'\n",*argv);
for (; argc > 0; --argc, ++argv) {
cp = *argv;
if (*cp != '-')
break;
cp += 2;
switch (cp[-1]) {
case 's':
opt_s = 1;
break;
case 'R':
opt_R = cp;
break;
}
}
// slave just exits
if (opt_s)
exit(0);
if (opt_R != NULL) {
size_t Rsize = strtol(opt_R,&cp,10);
switch (*cp) {
case 'K':
case 'k':
Rsize *= 1024;
break;
case 'M':
case 'm':
Rsize *= 1024;
Rsize *= 1024;
break;
}
printf("stksiz: %zu (ARG)\n",Rsize);
struct rlimit rlim;
getrlimit(RLIMIT_STACK,&rlim);
printf("stksiz: %lu %lu (OLD)\n",rlim.rlim_cur,rlim.rlim_max);
rlim.rlim_cur = Rsize;
setrlimit(RLIMIT_STACK,&rlim);
getrlimit(RLIMIT_STACK,&rlim);
printf("stksiz: %lu %lu (NEW)\n",rlim.rlim_cur,rlim.rlim_max);
}
printf("arg_lgx: %zu\n",arg_lgx);
printf("arg_max: %zu\n",arg_max);
printf("envlen: %zu\n",envlen);
lo = 32;
hi = 100000000;
while (lo < hi) {
size_t mid = (lo + hi) / 2;
doexec(mid);
if (status == 0)
lo = mid + 1;
else
hi = mid - 1;
}
return 0;
}
Here's the program output:
arg_lgx: 131072
arg_max: 2097152
envlen: 3929
32 50000016 100000000
00000700 7 -- Argument list too long
32 25000023 50000015
00000700 7 -- Argument list too long
32 12500027 25000022
00000700 7 -- Argument list too long
32 6250029 12500026
00000700 7 -- Argument list too long
32 3125030 6250028
00000700 7 -- Argument list too long
32 1562530 3125029
00000700 7 -- Argument list too long
32 781280 1562529
00000700 7 -- Argument list too long
32 390655 781279
00000700 7 -- Argument list too long
32 195343 390654
00000700 7 -- Argument list too long
32 97687 195342
97688 146515 195342
00000700 7 -- Argument list too long
97688 122101 146514
122102 134308 146514
134309 140411 146514
00000700 7 -- Argument list too long
134309 137359 140410
00000700 7 -- Argument list too long
134309 135833 137358
00000700 7 -- Argument list too long
134309 135070 135832
00000700 7 -- Argument list too long
134309 134689 135069
134690 134879 135069
134880 134974 135069
134975 135022 135069
00000700 7 -- Argument list too long
134975 134998 135021
134999 135010 135021
00000700 7 -- Argument list too long
134999 135004 135009
135005 135007 135009
135008 135008 135009
UPDATE:
The variation you're seeing is due to ASLR
(address space layout randomization). It randomizes the starting addresses of various sections of a program/process as a security mitigation.
There are a few methods to disable ASLR:
/proc/sys/kernel/randomize_va_space
personality
syscall.setarch
program uses the syscall method to invoke a subprogram in a manner similar to a shell.See: https://askubuntu.com/questions/318315/how-can-i-temporarily-disable-aslr-address-space-layout-randomization and Disable randomization of memory addresses
ASLR sets random starting positions for starting/highest stack address, envp
, argv
, and the starting stack position/frame given to main
.
What appears to be "unused" space is a function of that placement and padding/alignment. So, the space really isn't unused (i.e. potentially usable).
Even with the same exact arguments passed to a child the addresses change with ASLR on.
I knew about ASLR, but wasn't sure if it applied here (on the stack) [at first].
Before I figured out the connection, I enhanced my program to look at and compare some of these various addresses and offsets between them.
With ASLR on, however, if we run the child multiple [many ;-)] times, even if two or more runs happen to match on some of the same starting addresses (e.g. highest stack address) other parameters can still vary independently.
So, I enhanced the program to optionally disable ASLR via the personality
syscall, and, when disabled, each run has the same placement and offsets.
My refactored program is at the limit of what can be posted in a code block here, so here's a link: https://pastebin.com/gYwRFvcv [I don't normally do this--see the section below as to why].
There are many options to this program as I performed a number of experiments before reaching my conclusions.
The -A
option will disable ASLR. Consider running it with -x100000 -Ma@
[with/without] the -A
.
Another good combo is adding -L
to above. This overrides the binary search in favor of a single argument length that is within a reasonable size.
See the comments in the code for more information.
With that, you can experiment further if necessary [or give you some ideas] to modify your own program.
Upvotes: 7