Reputation: 8853
http://insanecoding.blogspot.co.uk/2011/11/how-to-read-in-file-in-c.html reviews a number of ways of reading an entire file into a string in C++. The key code for the fastest option looks like this:
std::string contents;
in.seekg(0, std::ios::end);
contents.resize(in.tellg());
in.seekg(0, std::ios::beg);
in.read(&contents[0], contents.size());
Unfortunately, this is not safe as it relies on the string
being implemented in a particular way. If, for example, the implementation was sharing strings then modifying the data at &contents[0]
could affect strings other than the one being read. (More generally, there's no guarantee that this won't trash arbitrary memory -- it's unlikely to happen in practice, but it's not good practice to rely on that.)
C++ and the STL are designed to provide features that are efficient as C, so one would expect there to be a version of the above that was just as fast but guaranteed to be safe.
In the case of vector<T>
, there are functions which can be used to access the raw data, which can be used to read a vector efficiently:
T* vector::data();
const T* vector::data() const;
The first of these can be used to read a vector<T>
efficiently. Unfortunately, the string
equivalent only provides the const
variant:
const char* string::data() const noexcept;
So this cannot be used to read a string efficiently. (Presumably the non-const
variant is omitted to support the shared string implementation.)
I have also checked the string constructors, but the ones that accept a char*
copy the data -- there's no option to move it.
Is there a safe and fast way of reading the whole contents of a file into a string?
It may be worth noting that I want to read a string
rather than a vector<char>
so that I can access the resulting data using a istringstream
. There's no equivalent of that for vector<char>
.
Upvotes: 6
Views: 2552
Reputation: 1762
I think using &string[0]
is just fine, and it should work with the widely used standard library implementations (even if it is technically UB).
But since you mention that you want to put the data into an istringstream
, here's an alternative:
new char[in.tellg()]
)stringstream
(without the leading 'i')The istringstream
would have to copy the data anyway, because a std::stringstream
doesn't store a std::string
internally as far as I'm aware, so you can leave the std::string
away and put the data into it directly.
EDIT: Actually, instead of the manual allocation (or make_unique
), this way you could also use the vector<char>
you mentioned.
Upvotes: 1
Reputation: 63481
If you really want to avoid copies, you can slurp the file into a std::vector<char>
, and then roll your own std::basic_stringbuf
to pull data from the vector.
You can then declare a std::istringstream
and use std::basic_ios::rdbuf
to replace the input buffer with your own one.
The caveat is that if you choose to call istringstream::str
it will invoke std::basic_stringbuf::str
and will require a copy. But then, it sounds like you won't be needing that function, and can actually stub it out.
Whether you get better performance this way would require actual measurement. But at least you avoid having to have two large contiguous memory blocks during the copy. Additionally, you could use something like std::deque
as your underlying structure if you want to cope with truly huge files that cannot be allocated in contiguous memory.
It's also worth mentioning that if you're really just streaming that data you are essentially double-buffering by reading it into a string first. Unless you also require the contents in memory for some other purpose, the buffering inside std::ifstream
is likely to be sufficient. If you do slurp the file, you may get a boost by turning buffering off.
Upvotes: 2