Andres Perez
Andres Perez

Reputation: 159

Programatically Determining the File System Block Size

This is the code that I'm using to time how much time it takes to read from a file varying the amount of bytes read.

for(int i = 0; i < TRIALS; i++){

            fd = open(argv[1], O_RDONLY);
            //Set the file offset to a random position on the file
            //But still a multiple of the current current test block size,
            //Simulating jumps of the given test block size
            //trying to avoid prefetch
            lseek(fd, test_block * rand() % (fs / test_block), SEEK_SET);

            //How much time takes to read `test_block` bytes
            clock_gettime(CLOCK_MONOTONIC, &ts_ini);
            ssize_t bytes_read = read(fd, buffer, test_block);
            clock_gettime(CLOCK_MONOTONIC, &ts_end);

            if(bytes_read > 0){
                accum += (((double)(ts_end.tv_sec - ts_ini.tv_sec)) + 
                    (ts_end.tv_nsec - ts_ini.tv_nsec)/NANO_TO_SEC) / TRIALS;
            }
            //Closing the file after each trial to release resources
            close(fd);
        }

Now, if I run this program with this little shell script:

echo "Block Size(bytes) | Avg. Time(seconds)"
for block in 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384
do
    ./bin/blocksize /tmp/random_gen $block
done

I get this result:

Block Size(bytes) | Avg. Time(seconds)
                4 | 0.002927567500    
                8 | 0.003120735600    
               16 | 0.004888980800    
               32 | 0.003885210600    
               64 | 0.003578379700    
              128 | 0.001272970500    
              256 | 0.004926633700    
              512 | 0.001281894000    
             1024 | 0.000243394200    
             2048 | 0.000175361100    
             4096 | 0.000001048200    
             8192 | 0.000001938000    
            16384 | 0.000003214000

With this result I'm assuming that the block size of my system is 4096 Bytes (which agrees with dumpe2fs), because at that point it doesn't need to do any additional operations, just pass the block that it got from the file, thus very fast, and after that the times duplicates. (this is my guess)

But here is the weird part, if I modify the sh script a little bit, adding to clean the caches before each execution, like this:

echo "Block Size(bytes) | Avg. Time(seconds)"
for block in 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384
do
    echo "echo 3 > /proc/sys/vm/drop_caches" | sudo sh
    ./bin/blocksize /tmp/random_gen $block
done

Then this happens:

Block Size(bytes) | Avg. Time(seconds)
                4 | 0.006217417300    
                8 | 0.003913319300    
               16 | 0.004674101500    
               32 | 0.005444699600    
               64 | 0.005125086700    
              128 | 0.004965967700    
              256 | 0.002433360800    
              512 | 0.002100266600    
             1024 | 0.002221131400    
             2048 | 0.001623008600    
             4096 | 0.001936151500    
             8192 | 0.001391976900    
            16384 | 0.001270749800

Which doesn't make any sense to me. Why the times keeps decreasing as the test block size increases when I clean the cache first?

Running this on Ubuntu 14.04LTS 64bit

Upvotes: 1

Views: 1341

Answers (2)

jim mcnamara
jim mcnamara

Reputation: 16399

Some points:

It is possible for filesystems to have different block sizes on one system.

When rereading the same file it is very likely that you will get improved times because of caching. Most modern hd devices have an onboard cache, and the OS has a cache as well.

POSIX provides a standard way to get filesystem information like block size: stavfs system call. Like stat, it also returns a struct. This shows the one on my system, each implementation may have some extra/different fields, so yours may differ:

 u_long      f_bsize;             /* preferred file system block size */
 u_long      f_frsize;            /* fundamental filesystem block
                                     (size if supported) */
 fsblkcnt_t  f_blocks;            /* total # of blocks on file system
                                     in units of f_frsize */
 fsblkcnt_t  f_bfree;             /* total # of free blocks */
 fsblkcnt_t  f_bavail;            /* # of free blocks avail to
                                     non-privileged user */
 fsfilcnt_t  f_files;             /* total # of file nodes (inodes) */
 fsfilcnt_t  f_ffree;             /* total # of free file nodes */
 fsfilcnt_t  f_favail;            /* # of inodes avail to
                                     non-privileged user*/
 u_long      f_fsid;              /* file system id (dev for now) */
 char        f_basetype[FSTYPSZ]; /* target fs type name,
                                     null-terminated */
 u_long      f_flag;              /* bit mask of flags */
 u_long      f_namemax;           /* maximum file name length */
 char        f_fstr[32];          /* file system specific string */
 u_long      f_filler[16];        /* reserved for future expansion */

http://pubs.opengroup.org/onlinepubs/009695399/basedefs/sys/statvfs.h.html

Upvotes: 2

Andrew Henle
Andrew Henle

Reputation: 1

File system cache and readahead, most likely.

The file system see you read the first 4k block, and it reads more than that based on the assumption that you'll read the rest of the file.

Try using O_RDONLY | O_DIRECT flags to bypass the file system cache. If the file system supports direct IO, you should see a difference. You'll probably have to use valloc()/memalign() to get a memory-page-aligned buffer to read your data into.

Upvotes: 0

Related Questions