Mohan
Mohan

Reputation: 8853

Reading large strings in C++ -- is there a safe fast way?

http://insanecoding.blogspot.co.uk/2011/11/how-to-read-in-file-in-c.html reviews a number of ways of reading an entire file into a string in C++. The key code for the fastest option looks like this:

std::string contents;
in.seekg(0, std::ios::end);
contents.resize(in.tellg());
in.seekg(0, std::ios::beg);
in.read(&contents[0], contents.size());

Unfortunately, this is not safe as it relies on the string being implemented in a particular way. If, for example, the implementation was sharing strings then modifying the data at &contents[0] could affect strings other than the one being read. (More generally, there's no guarantee that this won't trash arbitrary memory -- it's unlikely to happen in practice, but it's not good practice to rely on that.)

C++ and the STL are designed to provide features that are efficient as C, so one would expect there to be a version of the above that was just as fast but guaranteed to be safe.

In the case of vector<T>, there are functions which can be used to access the raw data, which can be used to read a vector efficiently:

T* vector::data();
const T* vector::data() const; 

The first of these can be used to read a vector<T> efficiently. Unfortunately, the string equivalent only provides the const variant:

const char* string::data() const noexcept;

So this cannot be used to read a string efficiently. (Presumably the non-const variant is omitted to support the shared string implementation.)

I have also checked the string constructors, but the ones that accept a char* copy the data -- there's no option to move it.

Is there a safe and fast way of reading the whole contents of a file into a string?

It may be worth noting that I want to read a string rather than a vector<char> so that I can access the resulting data using a istringstream. There's no equivalent of that for vector<char>.

Upvotes: 6

Views: 2552

Answers (2)

mooware
mooware

Reputation: 1762

I think using &string[0] is just fine, and it should work with the widely used standard library implementations (even if it is technically UB).

But since you mention that you want to put the data into an istringstream, here's an alternative:

  1. Read the data into a char array (new char[in.tellg()])
  2. Construct a stringstream (without the leading 'i')
  3. Insert the data with stringstream::write

The istringstream would have to copy the data anyway, because a std::stringstream doesn't store a std::string internally as far as I'm aware, so you can leave the std::string away and put the data into it directly.

EDIT: Actually, instead of the manual allocation (or make_unique), this way you could also use the vector<char> you mentioned.

Upvotes: 1

paddy
paddy

Reputation: 63481

If you really want to avoid copies, you can slurp the file into a std::vector<char>, and then roll your own std::basic_stringbuf to pull data from the vector.

You can then declare a std::istringstream and use std::basic_ios::rdbuf to replace the input buffer with your own one.

The caveat is that if you choose to call istringstream::str it will invoke std::basic_stringbuf::str and will require a copy. But then, it sounds like you won't be needing that function, and can actually stub it out.

Whether you get better performance this way would require actual measurement. But at least you avoid having to have two large contiguous memory blocks during the copy. Additionally, you could use something like std::deque as your underlying structure if you want to cope with truly huge files that cannot be allocated in contiguous memory.

It's also worth mentioning that if you're really just streaming that data you are essentially double-buffering by reading it into a string first. Unless you also require the contents in memory for some other purpose, the buffering inside std::ifstream is likely to be sufficient. If you do slurp the file, you may get a boost by turning buffering off.

Upvotes: 2

Related Questions