VTxyer
VTxyer

Reputation: 61

Is Linux kernel splice() zero copy?

I know splice() is designed for zero copy and used Linux kernel pipe buffer to achieve that. For example if I wanted to copy data from one file descriptor(fp1) to another file descriptor(fp2), it didn't need to copy data from "kernel space->user space->kernel space". Instead it just copy data in kernel space the flow will be like "fp1 -> pipe_read -> pipe_write -> fp2". And my question is that dose kernel need to copy data between "fp1 -> pipe_read" and "pipe_write -> fp2"?

The Wikipedia said that:

Ideally, splice and vmsplice work by remapping pages and do not actually copy any data,    
which may improve I/O performance. As linear addresses do not necessarily correspond to
contiguous physical addresses, this may not be possible in all cases and on all hardware 
combinations.

I have already traced kernel source(3.12) for my question and I found that the flow between "fp1->write_pipe", in the end it would called kernel_readv() in fs/splice.c and then called "do_readv_writev()" and finally called "aio_write()"

558 static ssize_t kernel_readv(struct file *file, const struct iovec *vec,
559                 unsigned long vlen, loff_t offset)
//*vec would point to struct page which belong to pipe

The flow between "read_pipe -> fp2" in the end would call "__kernel_write()" and then called "fp2->f_op->write()"

430 ssize_t __kernel_write(struct file *file, const char *buf, size_t count, loff_t *pos)
//*buf is the pipe buffer

And I thought both "aio_write()" and "file->f_op_write()" would perform really data copy, so does splice() really perform zero copy?

Upvotes: 6

Views: 3534

Answers (2)

Damon
Damon

Reputation: 70126

splice most probably works zero-copy (there is no hard guarantee for that, but it almost certainly works that way for any reasonably recent hardware). Strictly following the docs, you would need to call it with SPLICE_F_MOVE so no actual copies are made, but I don't see how it would need to make one either way as long as there's DMA support (which is a rather fair assumption).

The same is not necessarily true with vmsplice involved since it (or a successive splice) only works zero-copy if the SPLICE_F_GIFT flag is provided (and in this case, I can see how it would not work otherwise, since the "source descriptor" is main memory) but this flag is broken in some and unsupported in other Linux versions, and badly documented on top.
For example, it is not clear what to do with the memory afterwards. The documentation used to say that you are not allowed to touch the gifted memory ever after, this was recently slightly reworded, but it isn't less ambiguous. It remains unclear what is to become of the memory region. Following the documentation, you would have to leak the memory. There seems to be no notification mechanism that tells you when it is safe to free the memory or reuse it.

aio_write is the userland (Glibc) implementation of asynchronous I/O which uses threads and the write syscall. This normally performs at least one copy from user space to kernel space.

Upvotes: 1

ikstream
ikstream

Reputation: 448

As I understand splice(), it will read pages of fd1 and the MMU will map these pages. The reference created by the mapping will be put into the pipe and handed over to fd2. No real data should be copied in the process, as long as every participant has DMA available. If no DMA is available you need to copy data.

Upvotes: 3

Related Questions