Web Developer
Web Developer

Reputation: 13

Calculating size of Directory in C

I want to calculate the size of the directory (path) recursively. In my current code I have a function that identifies if it's a directory or file, if it's a directory it calls the function with the subdirectory (file) and if it's a file it adds to the totalSize variable. However, my current code doesn't return anything meaning that there is an error somewhere. here is my code -

#include <sys/types.h>
#include <dirent.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <sys/stat.h>

void getsize(const char *path,int* totalSize);

int main()
{
    int total = 0;
    char path [] = "C:\\Users\\abbas\\Desktop\\Leetcode airplane";
    getsize(path,&total);
    printf("%d",total);
    return total;

}


void getsize(const char *path,int* totalSize)
{
    struct dirent *pDirent;
    struct stat buf;
    struct stat info;
    DIR *pDir;
    int exists;
    char str[100];
    pDir = opendir (path);
    while ((pDirent = readdir(pDir)) != NULL)
    {
        stat(pDirent->d_name,&info);
        if(S_ISDIR(info.st_mode))
        {
            strcpy(str,path);
            strcat(str,"/");
            strcat(str,pDirent->d_name);
            getsize(str,totalSize);
        }
        else
        {
            strcpy(str,path);
            strcat(str,"/");
            strcat(str,pDirent->d_name);
            exists = stat(str,&buf);
            if (exists < 0)
            {
                continue;
            }
            else
            {
                (*totalSize) += buf.st_size;
            }

        }
    }
    closedir(pDir);
}

Upvotes: 1

Views: 670

Answers (3)

user9706
user9706

Reputation:

  1. Include string.h.
  2. The arbitrary fixed sizestr[100] is problematic. If you are on Linux include linux/limits.h and use str[PATH_MAX] or even better pathconf(path, _PC_NAME_MAX). In either case you should either ensure the buffer is big enough (using snprintf() for instance), or dynamically allocate the buffer.
  3. You need to exclude . and .. otherwise you end up with an infinite loop (path/../path/..).
  4. stat(pDirent->d_name,&info) fails as you need to stat() path/pDirect->d_name not just pDirect->d_name.
  5. (not fixed) Maybe snprintf(path2, sizeof path2, "%s%s%s", path, PATH_SEP, pDirenv->d_name) instead of strcpy() and strcat()?
  6. Check return values of functions otherwise you are wasting time.
  7. No point of doing two stat() calls on the same path so just use (*totalSize) += buf.st_size;.
  8. (not fixed) On Windows, consider using _stat64() with the address of a struct __stat64 (@AndrewHenle).
  9. I assume you only want the size of files.
  10. (not fixed) It would be more natural if getsize() returned the size instead of using int *totalSize out parameter.
  11. (not fixed) Consider using nftw() (or the older ftw()) to walk the tree.

Note that program now accept path via command line for testing purposes.

#include <dirent.h>
#include <errno.h>
#include <linux/limits.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

const char PATH_SEP =
#ifdef _WIN32
    "\\";
#else
     "/";
#endif

void getsize(const char *path,int *totalSize) {
    struct dirent *pDirent;
    DIR *pDir = opendir (path);
    while ((pDirent = readdir(pDir)) != NULL) {
        if(
            !strcmp(pDirent->d_name, ".") ||
            !strcmp(pDirent->d_name, "..")
        )
            continue;

        char path2[PATH_MAX];
        strcpy(path2, path);
        strcat(path2, PATH_SEP);
        strcat(path2, pDirent->d_name);
        struct stat info;
        if(stat(path2, &info) == -1) {
            perror("stat");
            return;
        }
        if(S_ISDIR(info.st_mode))
            getsize(path2, totalSize);
        else if(S_ISREG(info.st_mode))
            (*totalSize) += info.st_size;
    }
    closedir(pDir);
}

int main(argc, char *argv[]) {
    if(argc != 2) {
        printf("usage: your_program path\n");
        return 1;
    }
    int total = 0;
    getsize(argv[1], &total);
    printf("%d\n",total);
}

and example test:

$ mkdir -p 1/2
$ dd if=/dev/zero of=1/file count=123
123+0 records in
123+0 records out
62976 bytes (63 kB, 62 KiB) copied, 0.000336838 s, 187 MB/s
$ dd if=/dev/zero of=1/2/file count=234
234+0 records in
234+0 records out
119808 bytes (120 kB, 117 KiB) copied, 0.0015842 s, 75.6 MB/s
$ echo $((62976 + 119808))
182784
$ ./your_program 1
182784

Upvotes: 5

chux
chux

Reputation: 153303

Practice safe coding.

Below risks buffer overflow.

        // Risky
        strcpy(str,path);
        strcat(str,"/");
        strcat(str,pDirent->d_name);

Had code done,

int len = snprintf(str, sizeof str, "%s/%s", path, pDirent->d_name);
if (len < 0 || (unsigned) len >= sizeof str) {
  fprintf(stderr, "Path too long %s/%s\n", path, pDirent->d_name);
  exit (-1);  
}

Then the code would have readily errored out do to recursion on "." and ".." and led to OP's self-discovery of a key problem.

This make for faster code production and more resilient code. Saves OP time.

Upvotes: 1

Wert
Wert

Reputation: 41

I think the major error of your code lies in the recursive logic.

To quote pp.183 of The C Programming Language:

Each directory always contains entries for itself, called ".", and its parent, ".."; these must be skipped, or the program will loop forever.

Therefore, maybe you can try adding the following if test at the beginning of the while loop:

while ((pDirent = readdir(pDir)) != NULL)
{
    if (strcmp(pDirent->d_name, ".") == 0
        || strcmp(pDirent->d_name, "..") == 0)
        continue;  /* skip self and parent */
    /* ... */
}

Still, there might be other errors, but I think this one is the most significant.

Upvotes: 3

Related Questions