Reputation: 137910
The x86-64 instruction set adds more registers and other improvements to help streamline executable code. However, in many applications the increased pointer size is a burden. The extra, unused bytes in every pointer clog up the cache and might even overflow RAM. GCC, for example, builds with the -m32
flag, and I assume this is the reason.
It's possible to load a 32-bit value and treat it as a pointer. This doesn't necessitate extra instructions, just load/compute the 32 bits and load from the resulting address. The trick won't be portable, though, as platforms have different memory maps. On Mac OS X, the entire low 4 GiB of address space is reserved. Still, for one program I wrote, hackishly adding 0x100000000L
to 32-bit "addresses" before use improved performance greatly over true 64-bit addresses, or compiling with -m32
.
Is there any fundamental impediment to having a 32-bit, x86-64 platform? I suppose that supporting such a chimera would add complexity to any operating system, and anyone wanting that last 20% should just Make it Work™, but it still seems that this would be the best fit for a variety of computationally intensive programs.
Upvotes: 13
Views: 2904
Reputation: 41962
Yes, you can limit the program to use the first 2/4 GB address space only, or use a 64-bit base with 32-bit (or less) offset
As Mysticial commented above, ICC can even automatically do that. It has the -auto-ilp32
/ /Qauto-ilp32
option to use 32-bit pointers in 64-bit mode if applicable:
Instructs the compiler to analyze the program to determine if there are 64-bit pointers that can be safely shrunk into 32-bit pointers and if there are 64-bit
long
s (on Linux* systems) that can be safely shrunk into 32-bitlong
s.
But if you don't have access to ICC or want to have more control over the output codegen then on Linux there's x32abi as others have mentioned
On Windows there's no x32abi like on Linux, but you can still use 32-bit pointers by disabling the /LARGEADDRESSAWARE
flag which is enabled for x86-64 binaries by default
By default, 64-bit Microsoft Windows-based applications have a user-mode address space of several terabytes. For precise values, see Memory Limits for Windows and Windows Server Releases. However, applications can specify that the system should allocate all memory for the application below 2 gigabytes. This feature is beneficial for 64-bit applications if the following conditions are true:
- A 2 GB address space is sufficient.
- The code has many pointer truncation warnings.
- Pointers and integers are freely mixed.
- The code has polymorphism using 32-bit data types.
All pointers are still 64-bit pointers, but the system ensures that every memory allocation occurs below the 2 GB limit, so that if the application truncates a pointer, no significant data is lost. Pointers can be truncated to 32-bit values, then extended to 64-bit values by either sign extension or zero extension.
Of course there's no direct compiler support like the -mx32
option in GCC, therefore you may need to deal with pointers manually every time you store a pointer to memory or dereference it. The simplest solution is to write a class wrapping a 32-bit pointer to handle that. Luckily MS also had experience on mixed 32 and 64-bit pointers in the same architecture so they have lots of supporting keywords/macros:
POINTER_32
/__ptr32
POINTER_64
/__ptr64
POINTER_SIGNED
/__sptr
POINTER_UNSIGNED
/__uptr
You can also force all memory allocations to happen below the 4 GB mark and handle everything manually
Anyway, limit to the first 2/4 GB memory page might not be feasible, because of the lack of memory or the reduced effectiveness of ASLR. You can tell the OS to allocate memory around some 64-bit base address instead. This way you can have multiple bases for an address space larger than 4GB
Google's V8 engine uses this to compress pointers to 32 bits to save memory as well as improve performance. See the comparison in memory and performance improvement here. They even discuss a nice optimization by setting the base to FS/GS segment register and free another general-purpose register
Or if your pointers are always aligned then you can drop the low bits to address a larger amount of memory, like in JVM's "compressed Oops" which always address 8-byte aligned objects
See also How does the compressed pointer implementation in V8 differ from JVM's compressed Oops?
Read more
Upvotes: 5
Reputation: 62106
I do not expect it very hard to support such a model in the OS. About the only thing that needs to change for processes in this model is page management, pages must be allocated below the 4 GB point. The kernel too should allocate its buffers from the first 4 GBs of the virtual address space if it passes them to the application. The same applies to the loader that loads and starts applications. Other than that a 64-bit kernel should be able handle such apps w/o major modifications.
Compiler support shouldn't be a big issue either. It's mostly a matter of generating code that can use the extra CPU registers and their full 64 bits and adding proper REX prefixes whenever needed.
Upvotes: 0
Reputation: 30449
There is an ABI called "x32" for linux in development. It's a mix between x86_64 and ia32 similar to what you describe - 32 bit address space while using the full 64 bit register set. It needs a custom kernel, binutils and gcc.
Some SPEC runs indicate a performace improvement of about 30% in some benchmarks. See further information at https://sites.google.com/site/x32abi/
Upvotes: 14
Reputation: 146998
It's called "x86-32 emulation", or WOW64 on Windows (presumably something else on other OSes) and it's a hardware flag in the processor. No need for any user-mode tricks here.
Upvotes: -5