Reputation: 43327
I'm looking at this little program where the memory usage is unpredictable because it operates on directories that may contain very few files or potentially over tens of millions of files, and needs to store all the file names in RAM at once due to kernel behavior.
This snippet is not the program I'm writing. This shows why I have to store all the file names in memory at once:
DIR *dir = opendir(".");
for (struct dirent *entry; (entry = readdir(dir);)
unlink(entry->d_name); // DANGER DO NOT RUN ME
This snippet looks like it removes all the files from the current directory but doesn't. The fundamental problem on modern systems is when you do something like this you miss files because removing or changing directory tree entries destabilizes the enumeration.
Almost the entirety of the program's memory must consist of a single buffer containing all the file names in the directory, so I'm thinking of declaring a memory structure that looks like this (ASLR is on so 0 isn't really 0):
000000000 ELF header
000000118 Program code
000001000 work buffers
000002000 begin stretchy array
400002000 no man's land (mapped with 0 access so anything hitting it faults)
400003000 top of stack
400005000 bottom of stack
The following does not need to be discussed: program code is too small or stack is too small. These constants are easily changed.
What does need to be discussed is the big stretchy array. Managing a discontiguous array is bloat, and we have an MMU so it doesn't need to be contiguous in physical memory. I've seen this trick done once long ago, on a system that didn't have to care because it was single user. I have no idea if this is going to cause a problem on a multi-user box.
The behavior I want here should be obvious. The memory isn't allocated until it's touched. Until then, it's nothing. So, how bad is it to say I have 16GB BSS segment and only use what I actually need? Since it's so large, I need to be absolutely sure the stack doesn't encounter it, so dynamically allocating the address space with mmap()
is a non-start.
Alternate hypothesis: In theory if there's a way to say it in the PE format, I could just reserve the memory at startup and mmap commit it as needed. According to the man page, there's no way to say reserve 16gb of address space in mmap and don't actually allocate any of it yet.
I have written down the ELF headers with the empty program (mov al, 60 ; syscall
) so it does run.
Upvotes: 0
Views: 43