Reputation: 358
I have to read and process 50 GB of file, and want to do it chunk by chunk say with buffer size of 5 GB. The problem is each row is of different format with different number of parameters. A sample snippet:
4 A 5 7
1 2 B 7 9 10
1 3 B 14 755 9874
5 A 2 7
...
So, cant do directly fread(. . .)
giving read size = 5GB as that would probably end in between of a number. So, I want to read maximum number of lines to buffer from file, but ending at '\n'
.
A possible solution could be to read say 1000 bytes less than 5 GB on first read, and keep iterating to read the file, setting the seek to start of file, increasing one byte each time till the last read byte is '\n'. But this solution will take much more reads, so wanted to know if there is some more optimal solution?
EDIT:
I use this simple code:
#include <iostream>
#include <cstdio>
using namespace std;
int main()
{
FILE* fp = fopen("outit", "r");
char *s = new char[1000];
fread(s,1,1000,fp);
cout<<s;
}
A small sample file has only these lines:
this is a line
this is another line
again another one
more another
But still, the output is:
this is a line
this is another line
again another one
more anotheram Files (x86)\CodeBlocks\MinGW\bin;C
:\WINDOWS\system32;C:\WINDO WS;C:\WINDOW
S\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Progr am Files\Microsoft SQL Server\110\Tools\Binn\;D:\Program Files\MATLAB\R2012b\run time\win64;D:\Program Files\MATLAB\R2012b\bin;C:\Program Files (x86)\Microsoft A SP.NET\ASP.NET Web Pages\v1.0\;C:\Program Files (x86)\Windows Kits\8.0\Windows P erformance Toolkit\;C:\Program Files (x86)\MySQL\MySQL Utilities 1.3.4\
What and why that garbage value coming?
Upvotes: 0
Views: 82
Reputation: 55395
'\n'
(starting the search from behind). This will be the logical
end of your buffer.Edit:
The garbage in the output is because the buffer is initialy unitialized and contains garbage and because there is no terminating NUL character for cout
to know when to stop printing.
When you call fread
and don't know exactly how much input you'll get, you need to check its return value that tells you how many characters it actually read. You can use it to set NUL terminator accordingly:
int n = fread(s,1,1000,fp);
s[n] = '\0':
cout << s;
Upvotes: 2