Reputation: 725

OpenMP file reading in C++

I have a piece of code that reads a file line by line and then it stores two corresponding binary representations in two vectors. But the size of vectors and the total number of lines processed are zero.

    int numLines = 319426908; // calculated before for loop
    char temp[100];
    vector<long long int> p1, p2;
    long long int c = 0;

    #pragma omp parallel for schedule(dynamic) shared(c, p1, p2, fp) private(temp)
      for(int i=0; i<numLines; i++){
        if(fgets(temp, 100, fp) != NULL){
          temp[strlen(temp)-1] = '\0';
          long long int *A = toLongLong(temp);
          p1.push_back(A[0]);
          p2.push_back(A[1]);
          c++;
        }
      }
      cout << "Completed ...c = " << c << endl;
      cout << "p1.size: " << p1.size() << " p2.size: " << p2.size() << endl;

This is the output

Completed ...c = 0
p1.size: 0 p2.size: 0

Where am I going wrong in the above piece of code?

Upvotes: 3

Answers (2)

Henri Menke

Reputation: 10939

To add an explicit example to MateusMP's answer. You should refactor your code to use more of the C++ standard library. There are many functions and data structures which will make your life easier and your code more readable.

#include <array>
#include <fstream>
#include <string>
#include <vector>

std::array<long long,2> toLongLong(std::string);

int main()
{
  int c = 0;
  std::vector<long long> p1;
  std::vector<long long> p2;

  int n_lines = 10;
  std::vector<std::string> lines(n_lines);

  // Read the file
  std::ifstream file("test.txt" , std::ios::in);
  for ( int i = 0; i < n_lines; ++i )
    std::getline(file, lines[i]);
  file.close();

  // Process the contents in parallel
  #pragma omp parallel
  {
    int c_local = 0;
    std::vector<long long> p1_local;
    std::vector<long long> p2_local;

    #pragma omp for
    for ( int i = 0; i < n_lines; ++i )
    {
      std::array<long long,2> A = toLongLong(lines[i]);
      p1_local.push_back(A[0]);
      p2_local.push_back(A[1]);
      ++c_local;
    }

    #pragma omp critical
    {
      p1.insert(p1.end(), p1_local.begin(), p1_local.end());
      p2.insert(p2.end(), p2_local.begin(), p2_local.end());
      c += c_local;
    }
  }
}

This compiles but doesn't link because toLongLong is not implemented.

Upvotes: 3

MateusMP

Reputation: 51

Read all input first, then parallelize the processing of that.

I'm not sure if fgets() is thread safe, but:

If it is, then you wont get much benefit on parallelizing it. That's because of the blocking needed to guarantee no other thread is updating it (unless it uses a non-blocking approach, but I believe it shouldn't be the case here)
Otherwise, then you might end up reading multiple times the same data, or even corrupted data.

Another option you have is splitting the file, and processing those. Anyway, if the file is not too big, try reading it all and then processing it. If it's too big, try loading some records (Say, like two thousand lines) processing those in parallel, repeat. You may need to do some testing to see which approach suits you better.

~ Edit: As the others said, you might want to check the concurrent access to those variables too. To put it all together, maybe a reduction to calculate the final result? See: C++ OpenMP Parallel For Loop - Alternatives to std::vector

Upvotes: 3

OpenMP file reading in C++

Answers (2)

Related Questions