Reputation: 51
I have an application for users to get data from database and download as csv file.
The general workflow follows:
My problem is backend keep throwing "OOM Java heap space" error under some extreme cases. For extreme case, all my memory was filled (4GB). My initial plan was to load data via pagination from database (not all at once to save memory), and generate a csv for each page data. In this case, GC will clear the memory once a csv was generated to keep whole memory usage is not that high. However, the actual case is memory is increasing all the time until all are used up. The GC does not work as expected. I got total 18 pages and around 200000 record (from db) per page at extreme case.
I used JProfiler to monitor heap usage and found that the retained size of those large byte[] objects are not 0 which might represent there exist some references link to them (I guess that's why GC does not clear them from memory as expected).
How should I optimize my code and VM environment to make sure the memory usage can be lower than 1GB for extreme case? What makes those large byte[] objects not cleared by GC as expected?
The code to get data from database and generate csv file
@Override
@Async
@Transactional(timeout = DOWNLOAD_DATA_TRANSACTION_TIME_LIMIT)
public void startDownloadDataInCSVBySearchQuery(SearchQuery query, DownloadRequestRecord downloadRecord) throws IOException {
logger.debug(Thread.currentThread().getName() + ": starts to process download data");
String username = downloadRecord.getUsername();
// get posts from database first
List<? extends SocialPost> posts = this.postsService.getPosts(query);
try (ByteArrayOutputStream out = new ByteArrayOutputStream()) {
// get ids of posts
List<String> postsIDs = this.getPostsIDsFromPosts(posts);
int postsSize = postsIDs.size();
// do pagination db search. For each page, there are 1500 posts
int numPages = postsSize / POSTS_COUNT_PER_PAGE + 1;
for (int i = 0; i < numPages; i++) {
logger.debug("Download comments: start at page {}, out of total page {}", i + 1, numPages);
int pageStartPos = i * POSTS_COUNT_PER_PAGE; // this is set to 1500
int pageEndPos = Math.min((i + 1) * POSTS_COUNT_PER_PAGE, postsSize);
// get post ids per page
List<String> postsIDsPerPage = postsIDs.subList(pageStartPos, pageEndPos);
// use posts ids to get corresponding comments from db, via sql "IN"
List<Comment> commentsPerPage = this.commentsService.getCommentsByPostsIDs(postsIDsPerPage);
// generate csv file for page data and upload to google cloud
String commentsFileName = "comments-" + downloadRecord.getDownloadTime() + "-" + (i + 1) + ".csv";
this.csvUtil.generateCommentsCsvFileStream(commentsPerPage, commentsFileName, out);
this.googleCloudStorageInstance.uploadDownloadOutputStreamData(out.toByteArray(), commentsFileName);
}
} catch (Exception ex) {
logger.error("Exception from downloading data: ", ex);
}
Code to generate csv file
// use Apache csv
public void generateCommentsCsvFileStream(List<Comment> comments, String filename, ByteArrayOutputStream out) throws IOException {
CSVPrinter csvPrinter = new CSVPrinter(new OutputStreamWriter(out), CSVFormat.DEFAULT.withHeader(PostHeaders.class).withQuoteMode(QuoteMode.MINIMAL));
for (Comment comment: comments) {
List<Object> record = Arrays.asList(
// write csv content
comment.getPageId(),
...
);
csvPrinter.printRecord(record);
}
// close printer to release memory
csvPrinter.flush();
csvPrinter.close();
}
Code to upload file to goole cloud storage
public Blob uploadDownloadOutputStreamData(byte[] fileStream, String filename) {
logger.debug("Upload file: '{}' to google cloud storage", filename);
BlobId blobId = BlobId.of(this.DownloadDataBucketName, filename);
BlobInfo blobInfo = BlobInfo.newBuilder(blobId).build();
return this.cloudStorage.create(blobInfo, fileStream);
}
The heap usage is increasing all the time as page increases.The G1 old gen heap usage is still very high after system crush.
The G1 Eden space is almost empty, big files are saved into Old gen directly.
Old gen GC activity is low, most of GC activities come from Eden space:
Heap walker shows the retained size of those big byte[] is not 0.
Upvotes: 1
Views: 2366
Reputation: 13690
You're using a single instance of ByteArrayOutputStream
which just writes to a in-memory byte array.
That looks like a mistake because you seem to only want to upload each page at a time, not the accumulated result so far (which includes ALL pages).
By the way, doing this is useless:
try (ByteArrayOutputStream out = new ByteArrayOutputStream())
ByteArrayOutputStream
does not need to be closed as it lives in memory. Just remove that. And create a new instance for each page (inside the pages for loop) instead of re-using the same instance for all pages and it might just work fine.
EDIT
Another advice would be to break this code up into more methods... not just because it's more readable with smaller methods, but because you're keeping temporary variables in scope for too long (causing unnecessary memory to stick around longer than needed).
For example:
List<? extends SocialPost> posts = this.postsService.getPosts(query);
try (ByteArrayOutputStream out = new ByteArrayOutputStream()) {
// get ids of posts
List<String> postsIDs = this.getPostsIDsFromPosts(posts);
....
From this point on, posts
is not used anymore, and I assume that it contains a lot of stuff... so you should "drop" that variable once you got the IDs.
Do something like this instead:
List<String> postsIDs = getAllPostIds(query);
....
List<String> getAllPostIds(SearchQuery query) {
// this variable will be GC'd after this method returns as it's no longer referenced (assuming getPostIDsFromPosts() doesn't store it in a field)
List<? extends SocialPost> posts = this.postsService.getPosts(query);
return this.getPostsIDsFromPosts(posts);
}
Upvotes: 1