Reputation: 309
I've found a few other questions on SO that are close to what I need but I can't figure this out. I'm reading a text file line by line and getting an out of memory error. Here's the code:
System.out.println("Total memory before read: " + Runtime.getRuntime().totalMemory()/1000000 + "MB");
String wp_posts = new String();
try(Stream<String> stream = Files.lines(path, StandardCharsets.UTF_8)){
wp_posts = stream
.filter(line -> line.startsWith("INSERT INTO `wp_posts`"))
.collect(StringBuilder::new, StringBuilder::append,
StringBuilder::append)
.toString();
} catch (Exception e1) {
System.out.println(e1.getMessage());
e1.printStackTrace();
}
try {
System.out.println("wp_posts Mega bytes: " + wp_posts.getBytes("UTF-8").length/1000000);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
System.out.println("Total memory after read: " + Runtime.getRuntime().totalMemory()/1000000 + "MB");
Output is like (when run in an environment with more memory):
Total memory before read: 255MB
wp_posts Mega bytes: 18
Total memory after read: 1035MB
Note than in my production environment, I cannot increase the memory heap.
I've tried explicitly closing the stream, doing a gc, and putting stream in parallel mode (consumed more memory).
My questions are: Is this amount of memory usage expected? Is there a way to use less memory?
Upvotes: 1
Views: 729
Reputation: 9283
The way you calculated memory is incorrect due to the following reasons:
Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()
System.gc()
before calculating used memory. Ofcourse, you don't call gc in production and also calling gc does not guarantee that JVM would indeed trigger garbage collection. But for testing purpose, I think it works well.String
was not formed and the StringBuilder
had strong reference. You should call the capacity()
method of StringBuilder
to get the actual number of char
elements in the array within StringBuilder
and then multiply it by 2 to get the number of bytes because Java internally uses UTF16
which needs 2 bytes to store an ASCII character.StringBuilder
initially), every time your StringBuilder
runs out of space, it double the size of the internal array by creating a new array and copying the content. This means there will be triple the size allocated at a time than the actual String
. This you cannot measure because it happens within the StringBuilder
class and when the control comes out of StringBuilder
class the old array is ready for garbage collection. So, there is a high chance that when you get the OutOfMemory error, you get it at that point in StringBuilder
when it tries to allocate a double sized array, or more specifically in the Arrays.copyOf
methodLet's consider the program which is similar to yours.
public static void main(String[] arg) {
// Initialize the arraylist to emulate a
// file with 32 lines each containing
// 1000 ASCII characters
List<String> strList = new ArrayList<String>(32);
for (Integer i = 0; i < 32; i++) {
strList.add(String.format("%01000d", i));
}
StringBuilder str = new StringBuilder();
strList.stream().map(element -> {
// Print the number of char
// reserved by the StringBuilder
System.out.print(str.capacity() + ", ");
return element;
}).collect(() -> {
return str;
}, (response, element) -> {
response.append(element);
}, (response, element) -> {
response.append(element);
}).toString();
}
Here after every append, I'm printing the capacity of the StringBuilder
.
The output of the program is as follows:
16, 1000, 2002, 4006, 4006, 8014, 8014, 8014, 8014,
16030, 16030, 16030, 16030, 16030, 16030, 16030, 16030,
32062, 32062, 32062, 32062, 32062, 32062, 32062, 32062,
32062, 32062, 32062, 32062, 32062, 32062, 32062,
If your file has "n" lines (where n is a power of 2) and each line has an average "m" ASCII characters, the capacity of the StringBuilder
at the end of the program execution will be: (n * m + 2 ^ (a + 1) ) where (2 ^ a = n).
E.g. if your file has 256 lines and an average of 1500 ASCII characters per line, the total capacity of the StringBuilder
at the end of program will be: (256 * 1500 + 2 ^ 9) = 384512 characters.
Assuming, you have only ASCII characters in you file, each character will occupy 2 bytes in UTF-16 representation. Additionally, everytime when the StringBuilder
array runs out of space, a new bigger array twice the size of original is created (see the capacity growth numbers above) and the content of the old array is copied to the new array. The old array is then left for garbage collection. Therefore, if you add another 2 ^ (a+1) or 2 ^ 9 characters, the StringBuilder
would create a new array for holding (n * m + 2 ^ (a + 1) ) * 2 + 2 characters and start copying the content of old array into the new array. Thus, there will be two big sized arrays within the StringBuilder
as the copying activity goes on.
thus the total memory will be: 384512 * 2 + (384512 * 2 + 2 ) * 2 = 23,07,076 = 2.2 MB (approx.) to hold only 0.7 MB data.
I have ignored the other memory consuming items like array header, object header, references etc. as those will be negligible or constant compared to the array size.
So, in conclusion, 256 lines with 1500 characters each, consumes 2.2 MB (approx.) to hold only 0.7 MB data (one-third data).
If you had initialized the StringBuilder
with the size 3,84,512 at the beginning, you could have accommodated the same number of characters in one-third memory and also there would have been much less work for CPU in terms of array copy and garbage collection
Finally, in such kind of problems, you may want to do it in chunks where you would write the content of your StringBuilder
in a file or database as soon as it has processed 1000 records (say), clear the StringBuilder
and start over again for the next batch of records. Thus you'd never hold more than 1000 (say) record worth of data in memory.
Upvotes: 0
Reputation: 44952
Your Runtime.totalMemory()
calculation is pointless if you are allowing JVM to resize the heap. Java will allocate heap memory as needed as long as it doesn't exceed -Xmx
value. Since JVM is smart it won't allocate heap memory by 1 byte at a time because it would be very expensive. Instead JVM will request a larger amount of memory at a time (actual value is platform and JVM implementation specific).
Your code is currently loading the content of the file into memory so there will be objects created on the heap. Because of that JVM most likely will request memory from the OS and you will observer increased Runtime.totalMemory()
value.
Try running your program with strictly sized heap e.g. by adding -Xms300m -Xmx300m
options. If you won't get OutOfMemoryError
then decrease the heap until you get it. However you also need to pay attention to GC cycles, these things go hand in had and are a trade off.
Alternatively you can create a heap dump after the file is processed and then explore the data with MemoryAnalyzer.
Upvotes: 0
Reputation: 18245
Your problem is in collect(StringBuilder::new, StringBuilder::append, StringBuilder::append)
. When you add smth to the StringBuilder
and it has not enough internal array, then it double it and copy part from previous one.
Do new StringBuilder(int size)
to predefine size of internal array.
Second problem, is that you have a big file, but as result you put it into a StringBuilder
. This is very strange to me. Actually this is same as read whole file into a String
without using Stream
.
Upvotes: 1