user497510
user497510

Reputation: 9

Performance of string streams versus file I/O streams in C++

I have to read in a huge text file (>200,000 words) and process each word. I read in the entire file into a string and then attach a string stream to it to process easily each word. The approach is I directly input each word from file using << and process it but comparing both the approaches does not give me any advantage in terms of execution time. Isn't it faster to operate on a string in memory than from a file which needs a system call every time I need a word? Please suggest some performance enhancing methods.

Upvotes: 1

Views: 2878

Answers (4)

Nim
Nim

Reputation: 33655

For performance and minimal copying, this is hard to beat (as long as you have enough memory!):

void mapped(const char* fname)
{
  using namespace boost::interprocess;

  //Create a file mapping
  file_mapping m_file(fname, read_only);

  //Map the whole file with read permissions
  mapped_region region(m_file, read_only);

  //Get the address of the mapped region
  void * addr       = region.get_address();
  std::size_t size  = region.get_size();

  // Now you have the underlying data...
  char *data = static_cast<char*>(addr);

  std::stringstream localStream;
  localStream.rdbuf()->pubsetbuf(data, size);

  // now you can do your stuff with the stream
  // alternatively
}

Upvotes: 5

Jerry Coffin
Jerry Coffin

Reputation: 490108

If you're going to put the data into a stringstream anyway, it's probably a bit faster and easier to copy directly from the input stream to the string stream:

std::ifstream infile("yourfile.txt");
std::stringstream buffer;

buffer << infile.rdbuf();

The ifstream will use a buffer, however, so while that's probably faster than reading into a string, then creating a stringstream, it may not be any faster than working directly from the input stream.

Upvotes: 4

user325117
user325117

Reputation:

The string will get reallocated and copied an awful lot of times to accommodate 200,000 words. That's probably what is taking the time.

You should use a rope if you want to create a huge string by appending.

Upvotes: 1

Michael Goldshteyn
Michael Goldshteyn

Reputation: 74360

There is caching involved, so it does not necessarily do a system call each time you extract. Having said that, you may get marginally better performance at parse time by parsing a single contiguous buffer. On the other hand, you are serializing the workload (read entire file, then parse), which can potentially be parallelized (read and parse in parallel).

Upvotes: 1

Related Questions