alexandernst
alexandernst

Reputation: 15109

Playing with syscall table from LKM

I'm overriding SYS_READ from the syscall table in Linux (3.x) but I'm having some troubles when unloading the module itself. I first load my module which finds the syscall table, then enables RW, overrides SYS_READ with my own SYS_READ function (which in fact doesn't do anything else than calling the original SYS_READ), then I wait a few moments, and then unload the module. On the unload method of my module I restore the original SYS_READ function back in the syscall table and set back the syscall table to RO.

The original SYS_READ function is restored properly, but I get this when I unload the module: http://pastebin.com/JyYpqYgL

What am I missing? Should I be doing something more after restoring the real SYS_READ ?

EDIT: GitHub link to the project: https://github.com/alexandernst/procmon

EDIT:

This is how I get the syscall table address:

void **sys_call_table;

struct idt_descriptor{
    unsigned short offset_low;
    unsigned short selector;
    unsigned char zero;
    unsigned char type_flags;
    unsigned short offset_high;
} __attribute__ ((packed));


struct idtr{
    unsigned short limit;
    void *base;
} __attribute__ ((packed));


void *get_sys_call_table(void){
    struct idtr idtr;
    struct idt_descriptor idtd;
    void *system_call;
    unsigned char *ptr;
    int i;

    asm volatile("sidt %0" : "=m" (idtr));
    memcpy(&idtd, idtr.base + 0x80 * sizeof(idtd), sizeof(idtd));
    system_call = (void*)((idtd.offset_high<<16) | idtd.offset_low);
    for(ptr=system_call, i=0; i<500; i++){
        if(ptr[0] == 0xff && ptr[1] == 0x14 && ptr[2] == 0x85)
            return *((void**)(ptr+3));
        ptr++;
    }

    return NULL;
}

sys_call_table = get_sys_call_table();

And this is how I set RW/RO:

unsigned long set_rw_cr0(void){
    unsigned long cr0 = 0;
    unsigned long ret;
    asm volatile("movq %%cr0, %%rax" : "=a"(cr0));
    ret = cr0;
    cr0 &= 0xfffffffffffeffff;
    asm volatile("movq %%rax, %%cr0" : : "a"(cr0));
    return ret;
}

void set_ro_cr0(unsigned long val){
    asm volatile("movq %%rax, %%cr0" : : "a"(val));
}

Finally, this is how I define my syscalls and change the syscall table:

asmlinkage ssize_t (*real_sys_read)(unsigned int fd, char __user *buf, size_t count);
asmlinkage ssize_t hooked_sys_read(unsigned int fd, char __user *buf, size_t count);

//set my syscall
real_sys_read = (void *)sys_call_table[__NR_read];
sys_call_table[__NR_read] = (void *)hooked_sys_read;

//restore real syscall
sys_call_table[__NR_read] = (void *)real_sys_read;

Upvotes: 3

Views: 1266

Answers (2)

Ilya Matveychikov
Ilya Matveychikov

Reputation: 4024

If you wish to unload the module that intercepts system calls aware of the situations when some process still in system call handler and your code (module's text segment) goes away from the memory. That leads to page fault as when the process returns from some kernel function (that sleeps) into your code the code doesn't exists anymore.

So, correct module unloading scheme must check for the processess that may sleeps in hooked system calls. Unloading possible only if there are no one process that sleeps in the syscall hook.

UPD

Please, see the patch that proves my theory. It adds the atomic counter that increments and decrements when the hooked_sys_read calls. So as I supposed there is a process that still waiting in read_sys_read while you module have been unloaded. This patch show that with the printk(read_counter) and it prints 1 for me which means that someone doesn't decrement the read_counter.

http://pastebin.com/1yLBuMDY

Upvotes: 2

Mats Petersson
Mats Petersson

Reputation: 129334

Here's some random ramblings, I'm far from sure any/all of it makes much sense, but it's getting late, and I'd rather write it down and get to bed than try to figure out exactly which (if any) is actually the problem. Hopefully something will help:

I take it you have checked that your restore actually restores the pointer - e.g. print the content of sys_call_table[__NR_read]?

I would definitely restore CR0 by or-ing back the bit you cleared, rather than restoring an old value - it may not matter most of the time, but there are other bits in CR0 that may change from time to time - probably only really the TS bit, but that's bad enough - getting some random restore of stale floating point or missing a floating point restore is a bad thing [and guess how easy it is to figure out that the reason some long-running math suddenly got completely incorrect results because your code unloaded a few hours earlier?]. That's almost certainly not why your code is crashing, but it will almost certainly cause problems at one point or another if you load/unload the module enough times. [Also, make sure you are not swapping between processors when you change CR0 - probably best to do some sort of locking to ensure you stay on the same processor whilst doing the whole update sys_call_table stuff].

I think the reason your code is crashing, however, is lack of cache-flushing (the OS isn't expecting this memory to change - and the process sees it as read only, so it shouldn't need to be checked for invalidation]. You need to flush the caches on all processors for the sys_call_table entry. I'm not sure what the easiest/best way to do that is. I think void flush_icache_range(unsigned long start, unsigned long end) is the call you need - but I'm not sure if that's a current or an old function. From here: https://www.kernel.org/doc/Documentation/cachetlb.txt

As I said initially, this is more ramblings than actually looking into how things work deep inside the kernel, etc. Time for my beauty sleep - I need as much of that as I can get... ;)

Upvotes: 1

Related Questions