Reputation: 11629
I am trying to compare the performace of mmap()
& read()
for file sizes varying from 1KB to 1GB (increments in factor of 10).
The way I do it is I read the entire files (sequentially) and then write the output to another file for both the cases and measure the time.
Code:
For the read()
code, I have:
19 char text[1000];
. . . . . .
77 while((bytes_read=read(d_ifp,text,1000))>0)
78 {
79 write(d_ofp, text, bytes_read);
80 }
And for the mmap()
code, I have:
20 //char *data;
21 uintmax_t *data;
22 //int *data;
. . . . . .
86 if((data = (uintmax_t*)mmap((caddr_t)0, sbuf.st_size, PROT_READ, MAP_SHARED, fd, 0)) == (uintmax_t*)(-1))
87 {
88 perror("mmap");
89 exit(1);
90 }
96 int j=0;
97 while (i<=sbuf.st_size)
98 {
99 fprintf(ofp, "data[%d]=%ju\n", i, data[j]);
101 i=i+sizeof(*data);
102 j++;
103 }
The calculated time in case of mmap()
varies depending upon how I declare my data
pointer (char
, int
, uintmax_t
), whereas in case of read()
it varies depending upon the size of buffer - text
.
Output:
Right now mmap
is proving to be really slow, which is surprising:
[read]: f_size: 1K B, Time: 8e-06 seconds
[read]: f_size: 10K B, Time: 1.4e-05 seconds
[read]: f_size: 100K B, Time: 8.3e-05 seconds
[read]: f_size: 1M B, Time: 0.000612 seconds
[read]: f_size: 10M B, Time: 0.009652 seconds
[read]: f_size: 100M B, Time: 0.12094 seconds
[read]: f_size: 1G B, Time: 6.5787 seconds
[mmap]: f_size: 1K B, Time: 0.002922 seconds
[mmap]: f_size: 10K B, Time: 0.004116 seconds
[mmap]: f_size: 100K B, Time: 0.020122 seconds
[mmap]: f_size: 1M B, Time: 0.22538 seconds
[mmap]: f_size: 10M B, Time: 2.2079 seconds
[mmap]: f_size: 100M B, Time: 22.691 seconds
[mmap]: f_size: 1G B, Time: 276.36 seconds
Question:
1. If I take the buffer size in read
code equal to the type size in mmap
code, will the evaluation be correct/justified ?
2. What is the right way to compare these two ?
Edit:
I changed the fprintf
in mmap
code to write
, now the performance is way better but is very weird, it is decreasing for larger file sizes. Is that something expected ?
(I am writing my data to /dev/null
in both the cases):
[mmap]: f_size: 1K B, Time: 3.3e-05 seconds
[mmap]: f_size: 10K B, Time: 2e-06 seconds
[mmap]: f_size: 100K B, Time: 2e-06 seconds
[mmap]: f_size: 1M B, Time: 4e-06 seconds
[mmap]: f_size: 10M B, Time: 3e-06 seconds
[mmap]: f_size: 100M B, Time: 2e-06 seconds
[mmap]: f_size: 1G B, Time: 2e-06 seconds
Upvotes: 0
Views: 373
Reputation: 182639
This is somewhat speculation because I probably haven't thought of all the implications:
In the first case the majority of the time is taken by:
read(2)
system calls (many of them)
write(2)
system calls that don't take any time beyond syscall overhead (see below)In the second case the majority of the time is taken by:
mmap
). This does not actually read anything. The kernel just checks you have permissions and pretends to "map" the data.
write(2)
system calls. I assume the you perform fewer write
calls for a larger fileIn Linux writing to /dev/null
is implemented like this:
static ssize_t write_null(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
return count;
}
Which roughly means: "just tell the process we did it". Which means the mmap
ed memory is never touched -> the file is never read. So every time you only incur the cost of performing a system call. So the fewer write
s you do, the less time you waste in interrupts that don't do anything anyway.
In conclusion, in both cases write
s are cheap, no-op calls. But in the first one the read
actually costs because data must actually be pulled from a file.
printf
case ?In that case you were actively touching the memory mmap
ed, thus forcing the kernel to stop lying and to actually read the data from the file. In addition to that you also printed it, which depending on the buffering stdio
was using, was also triggering system calls from time to time. Just in case you were writing to the screen, this was especially costly since stdout
is by default line-buffered.
Upvotes: 3