Eldad
Eldad

Reputation: 51

how to get the page size of a specific address programmatically?

I am looking for a way to implement a function that gets an address, and tells the page size used in this address. One solution looks for the address in the segments in /proc//smaps and returns the value of "KernelPageSize:". This solution is very slow because it involves reading a file linearly, a file which might be long. I need a faster and more efficient solution.

Is there a system call for this? (int getpagesizefromaddr(void *addr);) If not, is there a way to deduce the page size?

Upvotes: 5

Views: 7260

Answers (1)

Nominal Animal
Nominal Animal

Reputation: 39308

Many Linux architectures support "huge pages", see Documentation/vm/hugetlbpage.txt for detailed information. On x86-64, for example, sysconf(_SC_PAGESIZE) reports 4096 as page size, but 2097152-byte huge pages are also available. From the application's perspective, this rarely matters; the kernel is perfectly capable of converting from one page type to another as needed, without the userspace application having to worry about it.

However, for specific workloads the performance benefits are significant. This is why transparent huge page support (see Documentation/vm/transhuge.txt) was developed. This is especially noticeable in virtual environments, i.e. where the workload is running in a guest environment. The new advice flags MADV_HUGEPAGE and MADV_NOHUGEPAGE for madvise() allows an application to tell the kernel about its preferences, so that mmap(...MAP_HUGETLB...) is not the only way to obtain these performance benefits.

I personally assumed Eldad's guestion was related to a workload running in a guest environment, and the point is to observe the page mapping types (normal or huge page) while benchmarking, to find out the most effective configurations for specific workloads.

Let's dispel all misconceptions by showing a real-world example, huge.c:

#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>

#define  PAGES 1024

int main(void)
{
    FILE   *in;
    void   *ptr;
    size_t  page;

    page = (size_t)sysconf(_SC_PAGESIZE);

    ptr = mmap(NULL, PAGES * page, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, (off_t)0);
    if (ptr == MAP_FAILED) {
        fprintf(stderr, "Cannot map %ld pages (%ld bytes): %s.\n", (long)PAGES, (long)PAGES * page, strerror(errno));
        return 1;
    }

    /* Dump /proc/self/smaps to standard out. */
    in = fopen("/proc/self/smaps", "rb");
    if (!in) {
        fprintf(stderr, "Cannot open /proc/self/smaps: %s.\n", strerror(errno));
        return 1;
    }
    while (1) {
        char *line, buffer[1024];

        line = fgets(buffer, sizeof buffer, in);
        if (!line)
            break;

        if ((line[0] >= '0' && line[0] <= '9') ||
            (line[0] >= 'a' && line[0] <= 'f') ||
            (strstr(line, "Page")) ||
            (strstr(line, "Size")) ||
            (strstr(line, "Huge"))) {
            fputs(line, stdout);
            continue;
        }
    }

    fclose(in);
    return 0;
}

The above allocates 1024 pages using huge pages, if possible. (On x86-64, one huge page is 2 MiB or 512 normal pages, so this should allocate two huge pages' worth, or 4 MiB, of private anonymous memory. Adjust the PAGES constant if you run on a different architecture.)

Make sure huge pages are enabled by verifying /proc/sys/vm/nr_hugepages is greater than zero. On most systems it defaults to zero, so you need to raise it, for example using

sudo sh -c 'echo 10 > /proc/sys/vm/nr_hugepages'

which tells the kernel to keep a pool of 10 huge pages (20 MiB on x86-64) available.

Compile and run the above program,

gcc -W -Wall -O3 huge.c -o huge && ./huge

and you will obtain an abbreviated /proc/PID/smaps output. On my machine, the interesting part contains

2aaaaac00000-2aaaab000000 rw-p 00000000 00:0c 21613022   /anon_hugepage (deleted)
Size:               4096 kB
AnonHugePages:         0 kB
KernelPageSize:     2048 kB
MMUPageSize:        2048 kB

which obviously differs from the typical parts, e.g.

01830000-01851000 rw-p 00000000 00:00 0   [heap]
Size:                132 kB
AnonHugePages:         0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB

The exact format of the complete /proc/self/smaps file is described in man 5 proc, and is quite straightforward to parse. Note that this is a pseudofile generated by the kernel, so it is never localized; the whitespace characters are HT (code 9) and SP (code 32), and newline is LF (code 10).


My recommended approach would be to maintain a structure describing the mappings, for example

struct region {
    size_t  start;    /* first in region at (void *)start */
    size_t  length;   /* last in region at (void *)(start + length - 1) */
    size_t  pagesize; /* KernelPageSize field */
};

struct maps {
    size_t           length;   /* of /proc/self/smaps */
    unsigned long    hash;     /* fast hash, say DJB XOR */
    size_t           count;    /* number of regions */
    pthread_rwlock_t lock;     /* region array lock */
    struct region   *region;
};

where the lock member is only needed if it is possible that one thread examines the region array while another thread is updating or replacing it.

The idea is that at desired intervals, the /proc/self/smaps pseudofile is read, and a fast, simple hash (or CRC) is calculated. If the length and the hash match, then assume mappings have not changed, and reuse the existing information. Otherwise, the write lock is taken (remember, the information is already stale), the mapping information parsed, and a new region array is generated.

If multithreaded, the lock member allows multiple concurrent readers, but protects against using a discarded region array.

Note: When calculating the hash, you can also calculate the number of map entries, as property lines all begin with an uppercase ASCII letter (A-Z, codes 65 to 90). In other words, the number of lines that begin with a lowercase hex digit (0-9, codes 48 to 57, or a-f, codes 97 to 102) is the number of memory regions described.


Of the functions provided by the C library, mmap(), munmap(), mremap(), madvise() (and posix_madvise()), mprotect(), malloc(), calloc(), realloc(), free(), brk(), and sbrk() may change the memory mappings (although I'm not certain this list contains them all). These library calls can be interposed, and the memory region list updated after each (successful) call. This should allow an application to rely on the memory region structures for accurate information.

Personally, I would create this facility as a preload library (loaded using LD_PRELOAD). That allows easily interposing the above functions with just a few lines of code: the interposed function calls the original function, and if successful, calls an internal function that reloads the memory region information from /proc/self/smaps. Care should be taken to call the original memory management functions, and to keep errno unchanged; otherwise it should be quite straightforward. I personally would also avoid using library functions (including string.h) to parse the fields, but I am overly careful anyway.

The interposed library would obviously also provide the function to query the page size at a specific address, say pagesizeat(). (If your application exports a weak version that always returns -1 with errno==ENOTSUP, your preload library can override it, and you don't need to worry about whether the preload library is loaded or not -- if not, the function will just return an error.)

Questions?

Upvotes: 5

Related Questions