Simulate effect of select() and poll() in kernel socket programming

Question

One of the Linux kernel drivers I am developing is using network communication in the kernel (sock_create(), sock->ops->bind(), and so on).

The problem is there will be multiple sockets to receive data from. So I need something that will simulate a select() or poll() in kernel space. Since these functions use file descriptors, I cannot use the system calls unless I use the system calls to create the sockets, but that seems unnecessary since I am working in the kernel.

So I was thinking of wrapping the default sock->sk_data_ready handler in my own handler (custom_sk_data_ready()), which would unlock a semaphore. Then I can write my own kernel_select() function that tries to lock the semaphore and does a blocking wait until it is open. That way the kernel function goes to sleep until the semaphore is unlocked by custom_sk_data_ready(). Once kernel_select() gets the lock, it unlocks and calls custom_sk_data_ready() to relock it. So the only additional initialization is to run custom_sk_data_ready() before binding a socket so the first call to custom_select() does not falsely trigger.

I see one possible problem. If multiple receives occur, then multiple calls to custom_sk_data_ready() will try unlock the semaphore. So to not lose the multiple calls and to track the sock being used, there will have to be a table or list of pointers to the sockets being used. And custom_sk_data_ready() will have to flag in the table/list which socket it was passed.

Is this method sound? Or should I just struggle with the user/kernel space issue when using the standard system calls?

Initial Finding:

All callback functions in the sock structure are called in an interrupt context. This means they cannot sleep. To allow the main kernel thread to sleep on a list of ready sockets, mutexes are used, but the custom_sk_data_ready() must act like a spinlock on the mutexes (calling mutex_trylock() repeatedly). This also means that any dynamic allocation must use the GFP_ATOMIC flag.

Additional possibility:

For every open socket, replace each socket's sk_data_ready() with a custom one (custom_sk_data_ready()) and create a worker (struct work_struct) and work queue (struct workqueue_struct). A common process_msg() function will be use for each worker. Create a kernel module-level global list where each list element has a pointer to the socket and contains the worker structure. When data is ready on a socket, custom_sk_data_ready() will execute and find the matching list element with the same socket, and then call queue_work() with the list element's work queue and worker. Then the process_msg() function will be called, and can either find the matching list element through the contents of the struct work_struct * parameter (an address), or use the container_of() macro to get the address of the list structure that holds the worker structure.

Which technique is the most sound?

mpe · Accepted Answer

Your second idea sounds more like it will work.

The CEPH code looks like it does something similar, see net/ceph/messenger.c.

Simulate effect of select() and poll() in kernel socket programming

Answers (1)

Related Questions