foboi1122
foboi1122

Reputation: 1757

Best way to read 12-15GB ASCII file in C++

I am trying to count the number of lines in a huge file. This ASCII file is anywhere from 12-15GB. Right now, I am using something along the lines of readline() to count each line of the file. But ofcourse, this is extremely slow. I've also tried to implement a lower level reading using seekg() and tellg() but due to the size of my file, I am unable to allocate a large enough array to store each character to run a '\n' comparison (I have 8GB of ram). What would be a faster way of reading this ridiculously large file? I've looked through many posts here and most people don't seem to have trouble with the 32bit system limitation, but here, I see that as a problem (correct me if I'm wrong).

Also, if anyone can recommend me a good way of splitting something this large, that would be helpful as well.

Thanks!

Upvotes: 3

Views: 731

Answers (4)

Remus Rusanu
Remus Rusanu

Reputation: 294267

Try Boost Memory-Mapped Files, one code for both Windows and POSIX platforms.

Upvotes: 4

John Gardner
John Gardner

Reputation: 25126

what OS are you on? is there no wc -l or equivalent command on that platform?

Upvotes: 0

Greg Hewgill
Greg Hewgill

Reputation: 993085

Memory-mapping a file does not require that you actually have enough RAM to hold the whole file. I've used this technique successfully with files up to 30 GB (I think I had 4 GB of RAM in that machine). You will need a 64-bit OS and 64-bit tools (I was using Python on FreeBSD) in order to be able to address that much.

Using a memory mapped file significantly increased the performance over explicitly reading chunks of the file.

Upvotes: 3

Billy ONeal
Billy ONeal

Reputation: 106539

Don't try to read the whole file at once. If you're counting lines, just read in chunks of a given size. A couple of MB should be a reasonable buffer size.

Upvotes: 6

Related Questions