Arnav_Garg
Arnav_Garg

Reputation: 429

Process memory mapping in C++

#include <iostream>

int main(int argc, char** argv) {
  int* heap_var = new int[1];
  /*
   * Page size 4KB == 4*1024 == 4096
   */
  heap_var[1025] = 1;
  std::cout << heap_var[1025] << std::endl;
  return 0;
}

// Output: 1

In the above code, I allocated 4 bytes of space in the heap. Now as the OS maps the virtual memory to system memory in pages (which are 4KB each), A block of 4KB in my virtual mems heap would get mapped to the system mem. For testing I decided I would try to access other addresses in my allocated page/heap-block and it worked, however I shouldn't have been allowed to access more than 4096 bytes from the start (which means index 1025 as an int variable is 4 bytes).

I'm confused why I am able to access 4*1025 bytes (More than the size of the page that has been allocated) from the start of the heap block and not get a seg fault.

Thanks.

Upvotes: 0

Views: 794

Answers (1)

krb
krb

Reputation: 16325

The platform allocator likely allocated far more than the page size is since it is planning to use that memory "bucket" for other allocation or is likely keeping some internal state there, it is likely that in release builds there is far more than just a page sized virtual memory chunk there. You also don't know where within that particular page the memory has been allocated (you can find out by masking some bits) and without mentioning the platform/arch (I'm assuming x86_64) there is no telling that this page is even 4kb, it could be a 2MB "huge" page or anything alike.

But by accessing outside array bounds you're triggering undefined behavior like crashes in case of reads or data corruption in case of writes.

Don't use memory that you don't own.

I should also mention that this is likely unrelated to C++ since the new[] operator usually just invokes malloc/calloc behind the scenes in the core platform library (be that libSystem on OSX or glibc or musl or whatever else on Linux, or even an intercepting allocator). The segfaults you experience are usually from guard pages around heap blocks or in absence of guard pages there simply using unmapped memory.

NB: Don't try this at home: There are cases where you may intentionally trigger what would be considered undefined behavior in general, but on that specific platform you may know exactly what's there (a good example is abusing pthread_t opaque on Linux to get tid without an overhead of an extra syscall, but you have to make sure you're using the right libc, the right build type of that libc, the right version of that libc, the right compiler that it was built with etc).

Upvotes: 2

Related Questions