Why does using more threads results in slower runtime?

Question

I am running Ubuntu on a machine with 32 CPUs (1 socket, 16 cores per socket, 2 threads per core).

I have a std::vector containing ~100-1000 objects and I am trying to parallelize a for loop which reads data from each object in the vector and writes to a file to log the state of each object. There is one file for each object. I've played around with omp_set_num_threads(8) and discovered that there is a sweet spot of around 8 threads. If I increase or decrease the number of threads the runtime performance will decrease. Given that I have 32 available CPUs I am not sure why increasing the thread count above 8 decreases runtime performance. I know many similar questions have been asked previously but I cannot seem to find a solution to my particular issue.

#include 
#include 
#include 
namespace fs = std::experimental::filesystem;  

void log() {
    omp_set_num_threads(8);

    // Log all object states
    if(this->logstate){
        #pragma omp parallel for
        for(auto i = vObject.begin(); i < vObject.end(); ++i)   {
            fs::path filename = (*i)->get_filename();
            std::ofstream OutputFile;
            OutputFile.open(filename, std::ios::app);
            OutputFile << std::setw(30) << (*i)->get_EPOCH() << std::setw(20) << std::scientific << std::setprecision(5) << (*i)->get_state() << std::endl;
            OutputFile.close();
        }
    }
}

Any thoughts or suggestions would be appreciated.

Why does using more threads results in slower runtime?

Answers (1)

Related Questions