Out of Memory issue (Heap) from generating large csv file

Question

I have an application for users to get data from database and download as csv file.

The general workflow follows:

User click download button at frontend.
Backend (SpringBoot in this case) will start an async thread to get data from database.
Generate csv files with data from step (2) and upload to google cloud storage.
Send user an email with signed url to download the data.

My problem is backend keep throwing "OOM Java heap space" error under some extreme cases. For extreme case, all my memory was filled (4GB). My initial plan was to load data via pagination from database (not all at once to save memory), and generate a csv for each page data. In this case, GC will clear the memory once a csv was generated to keep whole memory usage is not that high. However, the actual case is memory is increasing all the time until all are used up. The GC does not work as expected. I got total 18 pages and around 200000 record (from db) per page at extreme case.

I used JProfiler to monitor heap usage and found that the retained size of those large byte[] objects are not 0 which might represent there exist some references link to them (I guess that's why GC does not clear them from memory as expected).

How should I optimize my code and VM environment to make sure the memory usage can be lower than 1GB for extreme case? What makes those large byte[] objects not cleared by GC as expected?

The code to get data from database and generate csv file

@Override
    @Async
    @Transactional(timeout = DOWNLOAD_DATA_TRANSACTION_TIME_LIMIT)
    public void startDownloadDataInCSVBySearchQuery(SearchQuery query, DownloadRequestRecord downloadRecord) throws IOException {
        logger.debug(Thread.currentThread().getName() + ": starts to process download data");
        String username = downloadRecord.getUsername();
        // get posts from database first
        List posts = this.postsService.getPosts(query);
        try (ByteArrayOutputStream out = new ByteArrayOutputStream()) {
            // get ids of posts
            List postsIDs = this.getPostsIDsFromPosts(posts);
            int postsSize = postsIDs.size();
            // do pagination db search. For each page, there are 1500 posts
            int numPages = postsSize / POSTS_COUNT_PER_PAGE + 1;
            for (int i = 0; i < numPages; i++) {
                logger.debug("Download comments: start at page {}, out of total page {}", i + 1, numPages);
                int pageStartPos = i * POSTS_COUNT_PER_PAGE; // this is set to 1500
                int pageEndPos = Math.min((i + 1) * POSTS_COUNT_PER_PAGE, postsSize);
                // get post ids per page
                List postsIDsPerPage = postsIDs.subList(pageStartPos, pageEndPos);
                // use posts ids to get corresponding comments from db, via sql "IN"
                List commentsPerPage = this.commentsService.getCommentsByPostsIDs(postsIDsPerPage);
                // generate csv file for page data and upload to google cloud
                String commentsFileName = "comments-" + downloadRecord.getDownloadTime() + "-" + (i + 1) + ".csv";
                this.csvUtil.generateCommentsCsvFileStream(commentsPerPage, commentsFileName, out);
                this.googleCloudStorageInstance.uploadDownloadOutputStreamData(out.toByteArray(), commentsFileName);
            }
        } catch (Exception ex) {
            logger.error("Exception from downloading data: ", ex);
        }

Code to generate csv file

// use Apache csv 
public void generateCommentsCsvFileStream(List comments, String filename, ByteArrayOutputStream out) throws IOException {
        CSVPrinter csvPrinter = new CSVPrinter(new OutputStreamWriter(out), CSVFormat.DEFAULT.withHeader(PostHeaders.class).withQuoteMode(QuoteMode.MINIMAL));
        for (Comment comment: comments) {
            List

Out of Memory issue (Heap) from generating large csv file

Answers (1)

Related Questions