Yong Fu
Yong Fu

Reputation: 1

why sys_enter_execve get program name through bpf_get_current_comm

I am developing eBPF programming. Sometimes I cannot get the program name using execve, but I can use execv and syscall (SYS_execve,...). The specific code is as follows:

  1. ebpf code
static u32 ebpf_getppid(void)
{
    struct task_struct *task = (struct task_struct *)bpf_get_current_task();
    struct task_struct *parent = (struct task_struct *)BPF_CORE_READ(task, real_parent);

    return BPF_CORE_READ(parent, tgid);
}

SEC("tp/syscalls/sys_enter_execve")
int tracepoint__syscalls__sys_enter_execve(struct trace_event_raw_sys_enter *ctx)
{
    struct epm_command command = {};
    const char *filename = (const char *)BPF_CORE_READ(ctx, args[0]);
    const unsigned long *argv_ptr = (const unsigned long *)BPF_CORE_READ(ctx, args[1]);
    const unsigned long *envp_ptr = (const unsigned long *)BPF_CORE_READ(ctx, args[2]);
    char temp[128] = {0};

    for(int i = 0; i < 4; i++){
        bpf_printk("args[%d]: 0x%lx\n", i, BPF_CORE_READ(ctx, args[i]));
    }
    
    command.process_id = ebpf_getppid();
    command.timestamp = bpf_ktime_get_ns();
    bpf_get_current_comm(&command.process_name, sizeof(command.process_name));
    
    bpf_probe_read_str(&command.call_prog_name, sizeof(command.call_prog_name), filename);

    bpf_printk("Parent Process name: %s\n", command.process_name);
    bpf_printk("Call Process name: %s\n", command.call_prog_name);

    for(int i = 0; i < 64; i++) {
        unsigned long arg_ptr = 0;
        __builtin_memset(temp, 0, sizeof(temp));
        
        bpf_probe_read_str(&arg_ptr, sizeof(arg_ptr), &argv_ptr[i]);
        if(arg_ptr == 0) {
            break;
        }
        bpf_probe_read_str(temp, sizeof(temp), (void *)arg_ptr);
        bpf_printk("argv[%d]: %s\n", i, temp);
    }

    for(int i = 0; i < 64; i++) {
        unsigned long env_ptr = 0;
        __builtin_memset(temp, 0, sizeof(temp));
        
        bpf_probe_read_str(&env_ptr, sizeof(env_ptr), &envp_ptr[i]);
        if(env_ptr == 0) {
            break;
        }
        bpf_probe_read_str(temp, sizeof(temp), (void *)env_ptr);
        bpf_printk("envp[%d]: %s\n", i, temp);
    }

    bpf_map_update_elem(&epm_execve_map, &command.process_id, &command, BPF_ANY);

    return 0;
}
  1. User-level code that cannot get the program name
int main() {
    char *args[] = {"/usr/bin/ls", "-l", NULL, NULL};
    char *envp[] = {NULL};
    execve("/usr/bin/ls", args, envp);
    return 0;
}
  1. User-level code that can get the program name
int main() {
    char *args[] = {"/usr/bin/ls", "-l", NULL, NULL};
    char *envp[] = {NULL};
    printf("args addr: %p\n", args);
    printf("envp addr: %p\n", envp);
    execve("/usr/bin/ls", args, envp);
    return 0;
}

The difference between the two application-level codes is that printf is added to print args and envp. I would like to ask what is the specific reason for this?

I hope to get the correct answer to the above-described question

Upvotes: 0

Views: 39

Answers (1)

mozillazg
mozillazg

Reputation: 760

The behavior you’re seeing is expected and relates to how static strings are handled in memory. When you define args and envp as static arrays (e.g., char *args[] = {"/usr/bin/ls", "-l", NULL, NULL}), the compiler embeds these strings into the binary, but they aren’t loaded into memory until they’re accessed. In your eBPF program, the tracepoint__syscalls__sys_enter_execve runs before this access happens, so bpf_probe_read_str may fail to read the data, resulting in empty output.

When you add printf("args addr: %p\n", args), it forces the program to access these variables, triggering the kernel to fault the memory page containing the strings into RAM. Since memory is loaded in pages (not individual variables), this makes the data available by the time your eBPF probe runs. This explains why adding printf "fixes" the issue.

This is a known behavior in eBPF tracing. As noted in this GitHub issue comment:

the data you're using isn't in memory yet. These static strings are compiled in and are not actually faulted into memory until they're accessed. The access won't happen until its read, which is after your bpftrace probe ran. BPF won't pull the data in so you get an EFAULT/-14.

By printing the values or just a random print of a constant string you pull the small amount of data into memory (as it goes by page, not by var) and then it works

For a deeper dive, see this blog post which explores a similar case.

Upvotes: 2

Related Questions