Reputation: 159
This is the code that I'm using to time how much time it takes to read from a file varying the amount of bytes read.
for(int i = 0; i < TRIALS; i++){
fd = open(argv[1], O_RDONLY);
//Set the file offset to a random position on the file
//But still a multiple of the current current test block size,
//Simulating jumps of the given test block size
//trying to avoid prefetch
lseek(fd, test_block * rand() % (fs / test_block), SEEK_SET);
//How much time takes to read `test_block` bytes
clock_gettime(CLOCK_MONOTONIC, &ts_ini);
ssize_t bytes_read = read(fd, buffer, test_block);
clock_gettime(CLOCK_MONOTONIC, &ts_end);
if(bytes_read > 0){
accum += (((double)(ts_end.tv_sec - ts_ini.tv_sec)) +
(ts_end.tv_nsec - ts_ini.tv_nsec)/NANO_TO_SEC) / TRIALS;
}
//Closing the file after each trial to release resources
close(fd);
}
Now, if I run this program with this little shell script:
echo "Block Size(bytes) | Avg. Time(seconds)"
for block in 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384
do
./bin/blocksize /tmp/random_gen $block
done
I get this result:
Block Size(bytes) | Avg. Time(seconds)
4 | 0.002927567500
8 | 0.003120735600
16 | 0.004888980800
32 | 0.003885210600
64 | 0.003578379700
128 | 0.001272970500
256 | 0.004926633700
512 | 0.001281894000
1024 | 0.000243394200
2048 | 0.000175361100
4096 | 0.000001048200
8192 | 0.000001938000
16384 | 0.000003214000
With this result I'm assuming that the block size of my system is 4096 Bytes (which agrees with dumpe2fs
), because at that point it doesn't need to do any additional operations, just pass the block that it got from the file, thus very fast, and after that the times duplicates. (this is my guess)
But here is the weird part, if I modify the sh script a little bit, adding to clean the caches before each execution, like this:
echo "Block Size(bytes) | Avg. Time(seconds)"
for block in 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384
do
echo "echo 3 > /proc/sys/vm/drop_caches" | sudo sh
./bin/blocksize /tmp/random_gen $block
done
Then this happens:
Block Size(bytes) | Avg. Time(seconds)
4 | 0.006217417300
8 | 0.003913319300
16 | 0.004674101500
32 | 0.005444699600
64 | 0.005125086700
128 | 0.004965967700
256 | 0.002433360800
512 | 0.002100266600
1024 | 0.002221131400
2048 | 0.001623008600
4096 | 0.001936151500
8192 | 0.001391976900
16384 | 0.001270749800
Which doesn't make any sense to me. Why the times keeps decreasing as the test block size increases when I clean the cache first?
Running this on Ubuntu 14.04LTS 64bit
Upvotes: 1
Views: 1341
Reputation: 16399
Some points:
It is possible for filesystems to have different block sizes on one system.
When rereading the same file it is very likely that you will get improved times because of caching. Most modern hd devices have an onboard cache, and the OS has a cache as well.
POSIX provides a standard way to get filesystem information like block size: stavfs system call. Like stat, it also returns a struct. This shows the one on my system, each implementation may have some extra/different fields, so yours may differ:
u_long f_bsize; /* preferred file system block size */
u_long f_frsize; /* fundamental filesystem block
(size if supported) */
fsblkcnt_t f_blocks; /* total # of blocks on file system
in units of f_frsize */
fsblkcnt_t f_bfree; /* total # of free blocks */
fsblkcnt_t f_bavail; /* # of free blocks avail to
non-privileged user */
fsfilcnt_t f_files; /* total # of file nodes (inodes) */
fsfilcnt_t f_ffree; /* total # of free file nodes */
fsfilcnt_t f_favail; /* # of inodes avail to
non-privileged user*/
u_long f_fsid; /* file system id (dev for now) */
char f_basetype[FSTYPSZ]; /* target fs type name,
null-terminated */
u_long f_flag; /* bit mask of flags */
u_long f_namemax; /* maximum file name length */
char f_fstr[32]; /* file system specific string */
u_long f_filler[16]; /* reserved for future expansion */
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/sys/statvfs.h.html
Upvotes: 2
Reputation: 1
File system cache and readahead, most likely.
The file system see you read the first 4k block, and it reads more than that based on the assumption that you'll read the rest of the file.
Try using O_RDONLY | O_DIRECT
flags to bypass the file system cache. If the file system supports direct IO, you should see a difference. You'll probably have to use valloc()/memalign()
to get a memory-page-aligned buffer to read your data into.
Upvotes: 0