bpf how to inspect syscall arguments

Question

trace_output_kern.c traces sys_write syscall and prints the pid in userland:

#include 
#include 
#include 
#include "bpf_helpers.h"

struct bpf_map_def SEC("maps") my_map = {
    .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
    .key_size = sizeof(int),
    .value_size = sizeof(u32),
    .max_entries = 2,
};

SEC("kprobe/sys_write")
int bpf_prog1(struct pt_regs *ctx)
{
    struct S {
        u64 pid;
        u64 cookie;
    } data;

    data.pid = bpf_get_current_pid_tgid();
    data.cookie = 0x12345678;

    bpf_perf_event_output(ctx, &my_map, 0, &data, sizeof(data));

    return 0;
}

char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;

sys_read has a signature of sys_read(unsigned int fd, char __user *buf, size_t count);, and currently we only see the PID. The premise of tracing is that we get to intercept, and inspect the arguments. I was trying to see the arguments that gets passed on as well.

If I change that struct S to hold a char array to hold char *buf as

    struct S {
            u64 pid;
            u64 cookie;
            char bleh[128]; //<-- added this 
    } data;

it is throwing a fit:

/usr/src/linux-5.4/samples/bpf# ./trace_output
bpf_load_program() err=13
0: (bf) r6 = r1
1: (85) call bpf_get_current_pid_tgid#14
2: (b7) r1 = 305419896
3: (7b) *(u64 *)(r10 -136) = r1
4: (7b) *(u64 *)(r10 -144) = r0
5: (bf) r4 = r10
6: (07) r4 += -144
7: (bf) r1 = r6
8: (18) r2 = 0xffff8975bd44aa00
10: (b7) r3 = 0
11: (b7) r5 = 144
12: (85) call bpf_perf_event_output#25
invalid indirect read from stack off -144+16 size 144
processed 12 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
0: (bf) r6 = r1
1: (85) call bpf_get_current_pid_tgid#14
2: (b7) r1 = 305419896
3: (7b) *(u64 *)(r10 -136) = r1
4: (7b) *(u64 *)(r10 -144) = r0
5: (bf) r4 = r10
6: (07) r4 += -144
7: (bf) r1 = r6
8: (18) r2 = 0xffff8975bd44aa00
10: (b7) r3 = 0
11: (b7) r5 = 144
12: (85) call bpf_perf_event_output#25
invalid indirect read from stack off -144+16 size 144
processed 12 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

if sys_write is a bad (question) example, I've been also trying to trace sys_execve, which has an arg list of

asmlinkage long sys_execve(const char __user *filename,
                const char __user *const __user *argv,
                const char __user *const __user *envp);

Please point me the correct direction, thanks!

Edit 1

How do I intercept the arguments that was used for __x64_sys_execve?

When I try this below,

#include 
#include 
#include 
#include "bpf_helpers.h"

struct bpf_map_def SEC("maps") my_map = {
        .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
        .key_size = sizeof(int),
        .value_size = sizeof(u32),
        .max_entries = 2,
};

//SEC("kprobe/sys_write")
SEC("kprobe/__x64_sys_execve")

/* Signature of sys_execve: 
asmlinkage long sys_execve(const char __user *filename,
                const char __user *const __user *argv,
                const char __user *const __user *envp);
*/

int bpf_prog1(struct pt_regs *ctx, const char *filename)
{
        struct S {
                u64 pid;
                u64 cookie;
                char bleh[128];
        } data;

        data.pid = bpf_get_current_pid_tgid();
        data.cookie = 0x12345678;
        //bpf_get_current_comm(&data.bleh, 128);
        bpf_probe_read(&data.bleh, 128, (void *)filename);

        bpf_perf_event_output(ctx, &my_map, 0, &data, sizeof(data));

        return 0;
}

char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;

It blows up thusly:

/usr/src/linux-5.4/samples/bpf# ./borky
bpf_load_program() err=13
0: (bf) r6 = r2
R2 !read_ok
processed 1 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
0: (bf) r6 = r2
R2 !read_ok
processed 1 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

Qeole · Accepted Answer

The first part of your question was answered by pchaigno: if you extend your struct S and try to read it (bpf_perf_event_output(ctx, &my_map, 0, &data, sizeof(data));) without having initialised it, the verifier complains, because reading uninitialised memory from the kernel introduces a security risk. What you could do is, for example, zero the whole struct when declaring it:

        struct S {
                u64 pid;
                u64 cookie;
                char bleh[128];
        } data = {0};

Regarding the second part of your question with sys_execve, it turns out you cannot pass the syscall arguments to your function bpf_prog1() as you try to do. Your function should only take the struct pt_regs *ctx.

The confusion likely comes from the syntax used in bcc, where arguments are passed this way, but it is important to understand that bcc rewrites some parts under the hood, in particular this thing about accessing the arguments.

What you could use instead is the set of PT_REGS_PARM*(ctx) macros that are specifically defined to access the arguments of the probed function, from the relevant computer registers (example, definition). I think bcc also uses them when doing its rewriting job, but you wouldn't see it.

bpf how to inspect syscall arguments

Answers (1)

Related Questions