How to avoid OutOfMemory issues in particular scenario

Question

We have the background job which fetches records from particular table in batch of 1000 records at a time. This job runs at a interval of 30 minutes. Now, These records have email (key) and reason (value). The issues is we have to lookup these records against data warehouse, (sort of filtering kind of thing - fetches last 180 days data from warehouse). As call to the data warehouse is very costly in terms of time (45 minutes approximately). So, existing scenario is like this. We check for flat file on disk. if it does not exists. we make a call to data warehouse, fetch the records ( size ranges from 0.2 million to 0.25 million ) and write these records on a disk. on subsequent calls we do lookup from flat file only. - Loading entire file in memory and do in-memory search and filtering. This caused OutOfMemory issue so many times.

So, we modified the logic like this. We modified the Data warehouse call in 4 equal intervals and stored each result in file again with ObjectOutputStream, Now, on subsequent calls we load data into memory in interval, i.e. 0-30 days, 31-60 days and so on. But, this also is not helping out. Can experts please suggest what should be the ideal way to tackle this issue ?. Someone in senior team suggested to use CouchDB for storing and querying the data. But, at first glance I would prefer if with existing infrastructure any good solution is available ? if not then can think of using other tools.

Filtering code as of now.

private void filterRecords(List filterRecords) {
long start = System.currentTimeMillis();
logger.error("In filter Records - start (ms) : "+start);
Map invalidDataSet=new HashMap(5000);
try{
    boolean isValidTempResultFile = isValidTempResultFile();
    //1. fetch invalid leads data from DWHS only if existing file is 7 days older.

    String[] intervals = {"0","45","90","135"};

    if(!isValidTempResultFile){
        logger.error("#CCBLDM isValidTempResultFile false");
        for(String interval : intervals){
            invalidDataSet.clear();
             // This call takes approx 45 minutes to fetch the data
            getInvalidLeadsFromDWHS(invalidDataSet, interval, start);
            filterDataSet(invalidDataSet, filterRecords);
        }
    }
    else{
        //Set data from temporary file in interval to avoid heap space issue
        logger.error("#CCBLDM isValidTempResultFile true");
        intervals = new String[]{"1","2","3","4"};
        for(String interval : intervals){
            // Here GC won't happen at immediately causing OOM issue.
            invalidDataSet.clear();
            // Keeps 45 days of data in memory at a time
            setInvalidDataSetFromTempFile(invalidDataSet, interval);
            //2. mark current record as incorrect if it belongs to invalid email list
            filterDataSet(invalidDataSet, filterRecords);
        }
    }
}catch(Exception filterExc){
    Scheduler.log.error("Exception occurred while Filtering Records", filterExc);
}finally{
    invalidDataSet.clear();
    invalidDataSet = null;
}

long end = System.currentTimeMillis();
logger.error("Total time taken to filter all records ::: ["+(end-start)+"] ms.");
}

maaartinus · Accepted Answer

I'd strongly suggest to slightly change your infrastructure. IIYC you're looking up something in a file and something in a map. Working with a file is a pain and loading everything to memory causes OOME. You can do better using a memory mapped file, which allows fast and simple access.

There's a Chronicle-Map offering a Map interface to data stored off-heap. Actually, the data reside on the disk and take main memory as needed. You need to make your keys and values Serializable (which AFAIK you did already) or use an alternative way (which may be faster).

It's no database, it's just a ConcurrentMap, which makes working with it very simple. The whole installation is just adding a line like compile "net.openhft:chronicle-map:3.14.5" to build.gradle (or a few maven lines).

There are alternatives, but Chronicle-Map is what I've tried (just started with it, but so far everything works perfectly).

There's also Chronicle-Queue, in case you need batch processing, but I strongly doubt you'll need it as you're limited by your disk rather than main memory.

How to avoid OutOfMemory issues in particular scenario

Answers (2)

Related Questions