Reputation: 157
I'm using StringBuilder.append() to parse and process a file as following :
StringBuilder csvString = new StringBuilder();
bufferedReader.lines().filter(line -> !line.startsWith(HASH) && !line.isEmpty()).map(line -> line.trim())
.forEachOrdered(line -> csvString.append(line).append(System.lineSeparator()));
int startOfFileTagIndex = csvString.indexOf(START_OF_FILE_TAG);
int startOfFieldsTagIndex = csvString.indexOf(START_OF_FIELDS_TAG, startOfFileTagIndex);
int endOfFieldsTagIndex = csvString.indexOf(END_OF_FIELDS_TAG, startOfFieldsTagIndex);
int startOfDataTagIndex = csvString.indexOf(START_OF_DATA_TAG, endOfFieldsTagIndex);
int endOfDataTagIndex = csvString.indexOf(END_OF_DATA_TAG, startOfDataTagIndex);
int endOfFileTagIndex = csvString.indexOf(END_OF_FILE_TAG, endOfDataTagIndex);
int timeStartedIndex = csvString.indexOf("TIMESTARTED", endOfFieldsTagIndex);
int dataRecordsIndex = csvString.indexOf("DATARECORDS", endOfDataTagIndex);
int timeFinishedIndex = csvString.indexOf("TIMEFINISHED", endOfDataTagIndex);
if (startOfFileTagIndex != 0 || startOfFieldsTagIndex == -1 || endOfFieldsTagIndex == -1
|| startOfDataTagIndex == -1 || endOfDataTagIndex == -1 || endOfFileTagIndex == -1) {
log.error("not in correct format");
throw new Exception("not in correct format.");
}
The problem is that when the file is quite large i get an outofmemoryexception. Can you help me transform my code to avoid that exception with large files?
Edit: As I can understand charging a huge file into a string Builder is not a good idea and won't work. So the question is which structure in Java is the more appropriate to use to parse my huge file, delete some lines , find the index of some lines and seperate the file into parts (where to store those parts thaht can be huge) according to the found indexes then creating an output file in the end?
Upvotes: 0
Views: 330
Reputation: 2245
The OOM seems to be due to the fact that you are storing all lines in the StringBuilder
. When the file has too many lines, it will take up a huge amount of memory and may lead to OOM.
The strategy to avoid this depends upon what you are doing with appended strings.
As I can see in your code, you are only trying to verify the structure of the input file. In that case, you don't need to store all the lines in a StringBuilder
instance. Instead,
int
s to hold each index you are interested in, (or have an array of int
s)StringBuilder
, detect the presence of the "tag" or "index" you are looking for and save it in its designated int
variable.-1
but relative to other indices. (This you are currently achieving using a start index in the indexOf()
call.)for
loop in which to save some previous lines, append them and check. (Just one idea; you may have a better one.)Upvotes: 1