Reputation: 3334
I have written a little piece of code that use the nftw system call in order to do a tree walking.
int flags =0;
flags = flags | FTW_MOUNT;
int nopenfd = 10;
if( nftw( argv[1], sum_sizes, nopenfd, flags) == -1 )
return EXIT_FAILURE;
With this options, nftw doesn't scan directory if it's a mounted point and dereferences symbolic link (default behavior).
On each file nftw call this function:
/*total_size is the sum of each file/directory/link*/
long long int total_size, total_real_size = 0;
static int sum_sizes(const char *pathname, const struct stat *statbuf, int typeflag, struct FTW *ftwbuf)
{
/*if stat failed on the current file*/
if(typeflag == FTW_NS)
{
printf("No stats (permissions ?) on %s", pathname);
return 0;
}
total_size = total_size + (long long int ) statbuf->st_size;
total_real_size = total_real_size + (long long int ) ( statbuf->st_blocks * 512 );
return 0;
}
So At the end, I display the cumulative sizes:
printf("total size: %lld (%0.2lf K %0.2lf M)\n", total_size, (float)total_size /1024.0, (float)total_size /(1024.0*1024.0));
printf("total real size: %lld (%0.2lf K %0.2lf M)\n", total_real_size, (float)total_real_size /1024.0, (float)total_real_size /(1024.0*1024.0));
When I compare the values with du I have some differences
time ./scan_dir ~/
====>
total size: 15208413390 (14851966.00 K 14503.87 M)
total real size: 15708553216 (15340384.00 K 14980.84 M)
block size : 4096 / fond. block size : 4096
fs size: 22.7895 G
./scan_dir ~/ 0,03s user 0,24s system 98% cpu 0,277 total
time du -s ~/
15119876 /home/cedlemo/
du -s ~/ 0,07s user 0,22s system 98% cpu 0,287 total
Note: after reading the man page of du I know that du have almost the same behavior than my little application scan_dir (skip mount points, derefrences symbolic link and use 1024 in order to calculate is value in Ko)
It seems that the closer value found with my application is the total of real size (blocks used) but the value are not really the same.
What could be the reason(s) of this difference ? What I am doing wrong?
Upvotes: 0
Views: 1308
Reputation: 39298
By default, du
does not follow symlinks. Your code does.
du -ks DIRECTORY/
is equivalent to
find DIRECTORY/ -printf '%k\n' | awk '{s+=$1} END { printf "%.0f\n", s }'
which looks at each directory entry only once, does not follow symlinks, does not cross mount points, and outputs the total sum of st_blocks*2
(i.e., in 1024-byte units). In other words, the number of 1024-byte units allocated for file and directory contents -- disk usage.
The sum of logical file and directory sizes, on the other hand, is
find DIRECTORY/ -printf '%s\n' | awk '{s+=$1} END { printf "%.0f\n", s / 1024.0 }'
which has nothing to do with disk usage, only on the apparent amount of information stored in files and directories. Usually this measurement is limited to regular files only, i.e.
find DIRECTORY/ -type f -printf '%s\n' | awk '{s+=$1} END { printf "%.0f\n", s / 1024.0 }'
so it basically tells the user how large a file they'd get if they'd concatenate all their files together into one huge file. Whether it is meaningful is debatable, but many users find it informative. It definitely is a different measurement to disk usage, anyway.
In the file statistics (see man 2 fstat
), st_blocks
describes how many 512-byte units are allocated for the file contents, and st_size
the logical size of the file.
Most filesystems support sparse files. It means that when you enlarge a file using truncate()
, or by writing to a higher file offset than current file size, the filesystem does not store the skipped part at all. It is perfectly okay to read that part, however; it will always read as all-zeroes. Therefore, a huge file may only consume a few blocks, if it is mostly zeroes. (To be precise, "skipped zeroes". When creating a file, just writing zeroes does not produce a sparse file. Your application needs to skip writing the zeroes, to produce a sparse file.)
It is also possible for the number of blocks to be larger than one would assume based on the file size, due to indirect blocks used by some files on some filesystems. There may be "extra blocks" allocated and accounted for, because the file is fragmented or otherwise special. And on all typical filesystems, the number of allocated blocks is rounded up to a multiple of the filesystem allocation size, anyway.
In your case, total size
is the logical length of the file if you were to concatenate the contents of all your files and directories into a single file, including any duplicates referred to by symlinks.
In your case, total real size
describes the amount of disk space allocated for all files and directories total, if symlinks were replaced with copies of the original files.
If you change to
flags = FTW_MOUNT | FTW_PHYS;
you should get total real size
that matches du -s
.
Upvotes: 1