mca-surround
mca-surround

Reputation: 87

Checking file existence, size and similarity in C++

I am new to C++ and I am trying to do a few things with my code. I have been researching on how to do them but haven't been able to get my head around it and have been fairly unsuccessful.

bool Copy(char filenamein[], char filenameout[]);

int main(int argc, char **argv)
{
    if (argc !=3) {
        cerr << "Usage: " << argv[0] << " <input filename> <output filename>" << endl;
        int keypress; cin >> keypress;
        return -1;
    }

    if (Copy(argv[1], argv[2]))
        cout << "Copy completed" << endl;
    else
        cout << "Copy failed!" << endl;

    system("pause");

    return 0;
}

bool Copy(char filenamein[], char filenameout[])
{
    ifstream fin(filenamein);
    if(fin.is_open())
    {
        ofstream fout(filenameout);

        char c;
        while(fin.get(c))
        {
            fout.put(c);
        }

        fout.close();
        fin.close();

        return true;
    }

    return false;
}

This code already creates 2 text files, input.txt and output.txt. Both files also contains the same items/characters.

What I'm trying to do if checking if the input.txt file already exists before trying to copy it.

I am also wanting to check both files to make sure they are the same as well as checking the file sizes are equal.

How do I go about on doing this?

Upvotes: 1

Views: 1362

Answers (2)

Praxeolitic
Praxeolitic

Reputation: 24079

For general filesystem operations there's Boost Filesystem.

http://www.boost.org/doc/libs/1_57_0/libs/filesystem/doc/index.htm

To compare files you can calculate hashes and compare the hashes. For two files it would be just as efficient to compare them character by character but for more than two files comparing hashes wins.

For this there's Crypto++.

http://www.cryptopp.com/

Example of using the two libraries to solve the 3 problems in the question.

// C++ standard library
#include <iostream>

// Boost
#include <boost/filesystem.hpp>

// Crypto++
#include <cryptopp/sha.h>
#include <cryptopp/hex.h>
#include <cryptopp/files.h>

using std::string;

const string file_hash(const boost::filesystem::path &file);

int main( int argc, char** argv) {
    if (argc != 3)
    {
        std::cout << "Usage: " << argv[0]  << "filepath1 filepath2\n";
        return 1;
    }

    const string filename1(argv[1]);
    const string filename2(argv[2]);
    std::cout << "filename 1: " << filename1 << std::endl;
    std::cout << "filename 2: " << filename2 << std::endl;

    // file existence
    const bool file_exists1 = boost::filesystem::exists(filename1);
    const bool file_exists2 = boost::filesystem::exists(filename2);
    std::cout << "file 1 exists: " << std::boolalpha << file_exists1 << std::endl;
    std::cout << "file 2 exists: " << std::boolalpha << file_exists2 << std::endl;

    if (!file_exists1 || !file_exists2)
        return EXIT_SUCCESS;

    // file size
    const boost::filesystem::path file_path1(filename1);
    const boost::filesystem::path file_path2(filename2);

    const uintmax_t file_size1 = boost::filesystem::file_size(file_path1);
    const uintmax_t file_size2 = boost::filesystem::file_size(file_path2);
    std::cout << "file 1 size: " << std::boolalpha << file_size1 << std::endl;
    std::cout << "file 2 size: " << std::boolalpha << file_size2 << std::endl;

    // comparing files
    const string hash1 = file_hash(file_path1);
    const string hash2 = file_hash(file_path2);
    std::cout << "hash1: " << hash1 << std::endl;
    std::cout << "hash2: " << hash2 << std::endl;

    const bool same_file = hash1 == hash2;
    std::cout << "same file: " << same_file << std::endl;
}

const string file_hash(const boost::filesystem::path& file)
{
    string result;
    CryptoPP::SHA1 hash;
    CryptoPP::FileSource(file.string().c_str(),true,
            new CryptoPP::HashFilter(hash, new CryptoPP::HexEncoder(
                    new CryptoPP::StringSink(result), true)));
    return result;

}

Compilation on my laptop (the directories will of course be specific to wherever you have the headers and libraries but these are how homebrew installs them on OS X):

clang++ -I/usr/local/include -L/usr/local/lib -lcryptopp -lboost_system -lboost_filesystem demo.cpp -o demo

Example usage:

$ ./demo demo.cpp demo.cpp
filename 1: demo.cpp
filename 2: demo.cpp
file 1 exists: true
file 2 exists: true
file 1 size: 2084
file 2 size: 2084
hash1: 57E2E81D359C01DA02CB31621C9565DF0BCA056E
hash2: 57E2E81D359C01DA02CB31621C9565DF0BCA056E
same file: true
$ ./demo demo.cpp Makefile
filename 1: demo.cpp
filename 2: Makefile
file 1 exists: true
file 2 exists: true
file 1 size: 2084
file 2 size: 115
hash1: 57E2E81D359C01DA02CB31621C9565DF0BCA056E
hash2: 02676BFDF25FEA9E3A4D099B16032F23C469E70C
same file: false

Boost Filesystem will throw exceptions if you try to do stuff like get the size of a file that doesn't exist. You should be prepared to catch those exceptions so you don't need to explicitly test for file existence since you should have a catch block anyway. (If all you want to know is if a file exists but you don't want to do stuff with the file then it makes sense to test for existence explicitly.)

This is how I would go about doing these things in practice. If what you're asking is how these things would be done without libraries then you can check if a file exists by using the C or C++ standard library to try and open a file and check if you succeeded. For checking file size, you can open a file, you can seek to the end and compare the position to the beginning of the file.

However, it's preferable to rely on operating system support to interact with filesystems in general.

https://www.securecoding.cert.org/confluence/display/seccode/FIO19-C.+Do+not+use+fseek%28%29+and+ftell%28%29+to+compute+the+size+of+a+regular+file

fstat() for example is specific to Unix and Unix-like systems and returns a struct containing file size data but on Microsoft systems you use GetFileSizeEx() to get a file size. Because of this, if you want a portable solution then you have to use libraries that interact with the various operating systems for you and present a consistent API across operating systems.

Comparing files using only standard library support can be done by either implementing hashing functions or comparing files character by character.

Upvotes: 2

Nicolas Defranoux
Nicolas Defranoux

Reputation: 2676

Look at fstat, it will tell you the file size (or return an error if it does not exist).

You could also force the last update date of the copied file to be the same as the source file, so that if the source file changes but keeps the same size you will notice it (look at futimes to do so).

Upvotes: 0

Related Questions