weno
weno

Reputation: 856

Optimizing .txt files creation speed

I've written the following simple testing code, that creates 10 000 empty .txt files in a subdirectory.

#include <iostream>
#include <time.h>
#include <string>
#include <fstream>

void CreateFiles()
{
    int i = 1;
    while (i <= 10000) {
        int filename = i;
        std::string string_i = std::to_string(i);
        std::string file_dir = ".\\results\\"+string_i+".txt";
        std::ofstream outfile(file_dir);
        i++;
    }
}

int main()
{
    clock_t tStart1 = clock();
    CreateFiles();
    printf("\nHow long it took to make files: %.2fs\n", (double)(clock() - tStart1)/CLOCKS_PER_SEC);
    std::cin.get();
    return 0;
}

Everything works fine. All 10 000 .txt files are created within ~3.55 seconds. (using my PC)

Question 1: Ignoring the conversion from int to std::string etc., is there anything that I could optimize here for the program to create the files faster? I specifically mean the std::ofstream outfile usage - perhaps using something else would be relevantly faster?

Regardless, ~3,55 seconds is satisfying compared to the following:

I have modified the function so right now it would also fill the .txt files with some random i integer data and some constant text:

void CreateFiles()
{
    int i = 1;
    while (i <= 10000) {
        int filename = i;
        std::string string_i = std::to_string(i);
        std::string file_dir = ".\\results\\"+string_i+".txt";
        std::ofstream outfile(file_dir);

        // Here is the part where I am filling the .txt with some data
        outfile << i << " some " << i << " constant " << i << " text " << i << " . . . " 
        << i << " --more text-- " << i << " --even more-- " << i;
        i++;
    }
}

And now everything (creating the .txt files and filling it with short data) executes within... ~37 seconds. That's a huge difference. And that's only 10 000 files.

Question 2: Is there anything I can optimize here? Perhaps there exist some alternative that would fill the .txt files quicker. Or perhaps I have forgotten about something very obvious that slows down the entire process?

Or, perhaps I am exaggerating a little bit and ~37 seconds seems normal and optimized?

Thanks for sharing your insights!

Upvotes: 1

Views: 392

Answers (1)

coder3101
coder3101

Reputation: 4165

The speed of creation of file is hardware dependent, faster the drive faster you can create the files.

This is evident from the fact that I ran your code on an ARM processor (Snapdragon 636, on a Mobile phone using termux), now mobile phones have flash memory that are very fast when it comes to I/O. So it ran under 3 seconds most of the time and some time 5 second. This variation is expected as drive has to handle multi process read writes. You reported that it took 47 seconds for your hardware. Hence you can safely conclude that I/O speed is significantly dependent on Hardware.


None the less I thought to do some optimization to your code and I used 2 different approaches.

  • Using a C counterpart for I/O

  • Using C++ but writing in a chunk in one go.

I ran the simulation on my phone. I ran it 50 times and here are the results.

  • C was fastest taking 2.73928 second on average to write your word on 10000 text files, using fprintf

  • C++ writing with the complete line at one go took 2.7899 seconds. I used sprintf to get the complete line into a char[] then wrote using << operator on ofstream.

  • C++ Normal (Your Code) took 2.8752 seconds

This behaviour is expected, writing in chunks is fasters. Read this answer as to why. C was fastest no doubt.

You may note here that The difference is not that significant but if you are on a hardware with slow I/O, this becomes significant.


Here is the code I used for simulation. You can test it yourself but make sure to replace std::system argument with your own commands (different for windows).

#include <iostream>
#include <time.h>
#include <string>
#include <fstream>
#include <stdio.h>

void CreateFiles()
{
    int i = 1;
    while (i <= 10000) {
       // int filename = i;
        std::string string_i = std::to_string(i);
        std::string file_dir = "./results/"+string_i+".txt";
        std::ofstream outfile(file_dir);

        // Here is the part where I am filling the .txt with some data
        outfile << i << " some " << i << " constant " << i << " text " << i << " . . . " 
        << i << " --more text-- " << i << " --even more-- " << i;
        i++;
    }
}

void CreateFilesOneGo(){
    int i = 1;
    while(i<=10000){
        std::string string_i = std::to_string(i);
        std::string file_dir = "./results3/" + string_i + ".txt";
        char buffer[256];
        sprintf(buffer,"%d some %d constant %d text %d . . . %d --more text-- %d --even more-- %d",i,i,i,i,i,i,i);
        std::ofstream outfile(file_dir);
        outfile << buffer;
        i++;
    }
}
        
void CreateFilesFast(){
    int i = 1;
    while(i<=10000){
    // int filename = i;
    std::string string_i = std::to_string(i);
    std::string file_dir = "./results2/"+string_i+".txt";
    FILE *f = fopen(file_dir.c_str(), "w");
    fprintf(f,"%d some %d constant %d text %d . . . %d --more text-- %d --even more-- %d",i,i,i,i,i,i,i);
    fclose(f);
    i++;
    }
}

int main()
{
    double normal = 0, one_go = 0, c = 0;
    for (int u=0;u<50;u++){
        std::system("mkdir results results2 results3");
        
        clock_t tStart1 = clock();
        CreateFiles();
        //printf("\nNormal : How long it took to make files: %.2fs\n", (double)(clock() - tStart1)/CLOCKS_PER_SEC);
        normal+=(double)(clock() - tStart1)/CLOCKS_PER_SEC;
       
        tStart1 = clock();
        CreateFilesFast();
        //printf("\nIn C : How long it took to make files: %.2fs\n", (double)(clock() - tStart1)/CLOCKS_PER_SEC);
        c+=(double)(clock() - tStart1)/CLOCKS_PER_SEC;
        
        tStart1 = clock();
        CreateFilesOneGo();
        //printf("\nOne Go : How long it took to make files: %.2fs\n", (double)(clock() - tStart1)/CLOCKS_PER_SEC);
        one_go+=(double)(clock() - tStart1)/CLOCKS_PER_SEC;
        
        std::system("rm -rf results results2 results3");
        std::cout<<"Completed "<<u+1<<"\n";
    }
    
    std::cout<<"C on average took : "<<c/50<<"\n";
    std::cout<<"Normal on average took : "<<normal/50<<"\n";
    std::cout<<"One Go C++ took : "<<one_go/50<<"\n";
    
    return 0;
}

Also I used clang-7.0 as the compiler.

If you have any other approach let me know, I will test that too. If you find a mistake do let me know, I will correct it as soon as possible.

Upvotes: 4

Related Questions