Reputation: 57
I've been looking for days on solving some key problems I'm running into, and I have not found a good answer for this problem yet.
I'm embarking on an academic (/learning) project that involves reading 3-50MB plain-text files on a regular basis, and eventually across millions of records (my current set is ~800,000 records)
Assuming the file can't be split()
into chunks, what's the best way to pass this this chunk between functions? Pass-by-value leads me to think (and, I believe, see) passing a 50MB file to a function, and returning a 20-30MB result set, means I have used wasted over 100MB memory just passing the file that's waiting to be reclaimed at GC. (Technically, the file can be split(), but those split()s are each 10MB large at times, and each must be held while processing)
I've made significant changes to my overall project recently, and I want to design the processing portion better this time. Previous method primarily read and processed the data in the driver itself--without a data container. When I attempted to use a data container, I ended up with similar results. Here's the first method I used:
I can probably split as I read, however, even those splits can be 5MB in size each (or more), and I need to keep most of them in memory until the file is done with processing (in case step 3 changes how step 4 works).. and even worse, some input readLine()'s might be 1-2MB long themselves (before the \n
).
So, what kind of design strategy would be best for handling these huge input files, and huge strings?
Upvotes: 0
Views: 54
Reputation: 11132
Pass-by-value leads me to think (and, I believe, see) passing a 50MB file to a function, and returning a 20-30MB result set, means I have used wasted over 100MB memory just passing the file that's waiting to be reclaimed at GC.
Incorrect. Java passes references by value, not the entire String
. What I would do is pass the (reference to) the string along with the start and end indices of the section of the string you want to process.
void read()
{
String input = /*your code here*/;
process(input, 37, 17576);
}
process(String input, int startIndex, int endIndex)
{
/*your code here, e.g.
for(int i = startIndex; i < endIndex; i++)
{
//do stuff
}*/
}
Also, if read
and process
are in the same class, you can just make the string a class field:
String input;
void read()
{
input = /*your code here*/;
process(37, 17576);
}
process(int startIndex, int endIndex)
{
/*your code here, e.g.
for(int i = startIndex; i < endIndex; i++)
{
//do stuff
}*/
}
Upvotes: 2