Reputation: 9
I have to read in a huge text file (>200,000 words) and process each word. I read in the entire file into a string and then attach a string stream to it to process easily each word. The approach is I directly input each word from file using <<
and process it but comparing both the approaches does not give me any advantage in terms of execution time. Isn't it faster to operate on a string in memory than from a file which needs a system call every time I need a word? Please suggest some performance enhancing methods.
Upvotes: 1
Views: 2878
Reputation: 33655
For performance and minimal copying, this is hard to beat (as long as you have enough memory!):
void mapped(const char* fname)
{
using namespace boost::interprocess;
//Create a file mapping
file_mapping m_file(fname, read_only);
//Map the whole file with read permissions
mapped_region region(m_file, read_only);
//Get the address of the mapped region
void * addr = region.get_address();
std::size_t size = region.get_size();
// Now you have the underlying data...
char *data = static_cast<char*>(addr);
std::stringstream localStream;
localStream.rdbuf()->pubsetbuf(data, size);
// now you can do your stuff with the stream
// alternatively
}
Upvotes: 5
Reputation: 490108
If you're going to put the data into a stringstream anyway, it's probably a bit faster and easier to copy directly from the input stream to the string stream:
std::ifstream infile("yourfile.txt");
std::stringstream buffer;
buffer << infile.rdbuf();
The ifstream
will use a buffer, however, so while that's probably faster than reading into a string, then creating a stringstream, it may not be any faster than working directly from the input stream.
Upvotes: 4
Reputation:
The string will get reallocated and copied an awful lot of times to accommodate 200,000 words. That's probably what is taking the time.
You should use a rope if you want to create a huge string by appending.
Upvotes: 1
Reputation: 74360
There is caching involved, so it does not necessarily do a system call each time you extract. Having said that, you may get marginally better performance at parse time by parsing a single contiguous buffer. On the other hand, you are serializing the workload (read entire file, then parse), which can potentially be parallelized (read and parse in parallel).
Upvotes: 1