Tavo
Tavo

Reputation: 181

Systemtap - Calling a syscall from kernel space

I'm trying to create a hard link, calling sys_link directly from a Systemtap Embedded C code. Basically, the code looks like:

function sys_link:long(oldname, newname) %{  /* pure */
    int error;
    mm_segment_t old_fs;

    old_fs = get_fs();
    set_fs(get_ds());

    error = psys_link(STAP_ARG_oldname, STAP_ARG_newname);

    set_fs(old_fs);

    STAP_RETURN(error);
%}

sys_link is not exported by the kernel so psys_link is resolved using kallsyms_lookup_name() on initialisation and I could test the address is resolved correctly. It seems the syscall is being called but it never returns.

*I DO KNOW it's not the best idea to call syscalls from kernel space, but trust me, I need to do this ;) *

On the other hand, I did another simpler test, calling filp_open, which is exported by the kernel and it's NOT even a syscall it's just a kernel function, to create a file with the same unsuccessful results:

function myopen:long(newname) %{  /* pure */
    struct file *file;
    mm_segment_t old_fs = get_fs();

    set_fs(get_ds());

    file = filp_open(STAP_ARG_newname, O_WRONLY|O_CREAT, 0644);

    set_fs(old_fs);

    STAP_RETURN(1);
%}

Any clue why the kernel gets frozen?

UPDATE: The syscall and function are called in the context of a syscall.open.return probe. As discussed in one of the comments, Systemtap return probes are implemented using kretprobe ... which replaces the function return address for a trampoline ... which AFAIU means that the syscall routine has already finished and this should have released any lock related to the filesystem itself, but I'm probably missing something.

Debugging the kernel at that point gives me the following traceback Apparently, the deadlock is in a kprobe lock.

>>> info threads
  Id   Target Id         Frame 
* 1    Thread 1 (CPU#0 [running]) __loop_delay () at arch/arm/lib/delay-loop.S:42
  2    Thread 2 (CPU#1 [running]) __loop_delay () at arch/arm/lib/delay-loop.S:42
  3    Thread 3 (CPU#2 [running]) __loop_delay () at arch/arm/lib/delay-loop.S:42
  4    Thread 4 (CPU#3 [running]) arch_spin_lock (lock=<optimised out>) at ./arch/arm/include/asm/spinlock.h:91

>>> thread 4
[Switching to thread 4 (Thread 4)]
#0  arch_spin_lock (lock=<optimised out>) at ./arch/arm/include/asm/spinlock.h:91
91          wfe();

>>> bt
#0  arch_spin_lock (lock=<optimised out>) at ./arch/arm/include/asm/spinlock.h:91
#1  do_raw_spin_lock_flags (flags=<optimised out>, lock=<optimised out>) at ./include/linux/spinlock.h:155
#2  __raw_spin_lock_irqsave (lock=<optimised out>) at ./include/linux/spinlock_api_smp.h:121
#3  _raw_spin_lock_irqsave (lock=0xc1541f80 <kretprobe_table_locks+2240>) at kernel/locking/spinlock.c:159
#4  0xc0412d18 in kretprobe_table_lock (flags=<optimised out>, hash=<optimised out>) at kernel/kprobes.c:1113
#5  kprobe_flush_task (tk=0xed165b00) at kernel/kprobes.c:1158
#6  0xc03814f8 in finish_task_switch (prev=0xed165b00) at kernel/sched/core.c:2783
#7  0xc0c19c38 in context_switch (cookie=..., next=<optimised out>, prev=<optimised out>, rq=<optimised out>) at kernel/sched/core.c:2902
#8  __schedule (preempt=<optimised out>) at kernel/sched/core.c:3402
#9  0xc0c1a1a4 in schedule () at kernel/sched/core.c:3457
#10 0xc0c1a54c in schedule_preempt_disabled () at kernel/sched/core.c:3490
#11 0xc03a23dc in cpu_idle_loop () at kernel/sched/idle.c:273
#12 cpu_startup_entry (state=<optimised out>) at kernel/sched/idle.c:302
#13 0xc031206c in secondary_start_kernel () at arch/arm/kernel/smp.c:412
#14 0x60301dec in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

NOTE: This is an ARM machine traceback, but the same result happens in i386.

Upvotes: 2

Views: 756

Answers (1)

fche
fche

Reputation: 2790

Systemtap probe handlers are generally run from within an atomic context, which means that preemption and/or interrupts are disabled. If you manage to call a kernel function from such a context, the target function better be similarly "atomic", i.e., never take any new locks or block.

Upvotes: 1

Related Questions