abu
abu

Reputation: 43

java application crashed by suspicious jdbc memory leak

I have been working on a java application which crawls page from Internet with http-client(version4.3.3). It uses one fixedThreadPool with 5 threads,each is a loop thread .The pseudocode is following.

public class Spiderling extends Runnable{
  @Override
  public void run() {

    while (true) {
        T task = null;
        try {
            task = scheduler.poll();

            if (task != null) {
                if Ehcache contains task's config
                        taskConfig = Ehcache.getConfig;
                else{
                    taskConfig = Query task config from db;//close the conn every time
                    put taskConfig into Ehcache
                }


                spider(task,taskConfig);
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    LOG.error("spiderling is DEAD");
}
}

I am running it with following arguments -Duser.timezone=GMT+8 -server -Xms1536m -Xmx1536m -Xloggc:/home/datalord/logs/gc-2016-07-23-10-28-24.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC on a server(2 cpus,2G memory) and it crashes pretty regular about once in two or three days with no OutOfMemoryError and no JVM error log.

Here is my analysis;

  1. I analyse the gc log with GC-EASY,the report is here. The weird thing is the Old Gen increasing slowly until the allocated max heap size,but the Full Gc has never happened even once.
  2. I suspect it might has memory leak,so I dump the heap map using cmd jmap -dump:format=b,file=soldier.bin and using the Eclipse MAT to analyze the dump file.Here is the problem suspect which object occupies 280+ M bytes.

The class "com.mysql.jdbc.NonRegisteringDriver", loaded by "sun.misc.Launcher$AppClassLoader @ 0xa0018490", occupies 281,118,144 (68.91%) bytes. The memory is accumulated in one instance of "java.util.concurrent.ConcurrentHashMap$Segment[]" loaded by "".

Keywords com.mysql.jdbc.NonRegisteringDriver java.util.concurrent.ConcurrentHashMap$Segment[] sun.misc.Launcher$AppClassLoader @ 0xa0018490.

I use c3p0-0.9.1.2 as mysql connection pool and mysql-connector-java-5.1.34 as jdbc connector and Ehcache-2.6.10 as memory cache.I have see all posts about 'com.mysql.jdbc.NonregisteringDriver memory leak' and still get no clue.

This problem has driven me crazy for several days, any advice or help will be appreciated!

**********************Supplementary description on 07-24****************

I use a JAVA WEB + ORM Framework called JFinal(github.com/jfinal/jfinal) which is open in github。 Here are some core code for further description about the problem.

/**
 * CacheKit. Useful tool box for EhCache.
 * 
 */

public class CacheKit {

private static CacheManager cacheManager;
private static final Logger log = Logger.getLogger(CacheKit.class);

static void init(CacheManager cacheManager) {
    CacheKit.cacheManager = cacheManager;
}

public static CacheManager getCacheManager() {
    return cacheManager;
}

static Cache getOrAddCache(String cacheName) {
    Cache cache = cacheManager.getCache(cacheName);
    if (cache == null) {
        synchronized(cacheManager) {
            cache = cacheManager.getCache(cacheName);
            if (cache == null) {
                log.warn("Could not find cache config [" + cacheName + "], using default.");
                cacheManager.addCacheIfAbsent(cacheName);
                cache = cacheManager.getCache(cacheName);
                log.debug("Cache [" + cacheName + "] started.");
            }
        }
    }
    return cache;
}

public static void put(String cacheName, Object key, Object value) {
    getOrAddCache(cacheName).put(new Element(key, value));
}

@SuppressWarnings("unchecked")
public static <T> T get(String cacheName, Object key) {
    Element element = getOrAddCache(cacheName).get(key);
    return element != null ? (T)element.getObjectValue() : null;
}

@SuppressWarnings("rawtypes")
public static List getKeys(String cacheName) {
    return getOrAddCache(cacheName).getKeys();
}

public static void remove(String cacheName, Object key) {
    getOrAddCache(cacheName).remove(key);
}

public static void removeAll(String cacheName) {
    getOrAddCache(cacheName).removeAll();
}

@SuppressWarnings("unchecked")
public static <T> T get(String cacheName, Object key, IDataLoader dataLoader) {
    Object data = get(cacheName, key);
    if (data == null) {
        data = dataLoader.load();
        put(cacheName, key, data);
    }
    return (T)data;
}

@SuppressWarnings("unchecked")
public static <T> T get(String cacheName, Object key, Class<? extends IDataLoader> dataLoaderClass) {
    Object data = get(cacheName, key);
    if (data == null) {
        try {
            IDataLoader dataLoader = dataLoaderClass.newInstance();
            data = dataLoader.load();
            put(cacheName, key, data);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
    return (T)data;
}

}

I use CacheKit like CacheKit.get("cfg_extract_rule_tree", extractRootId, new ExtractRuleTreeDataloader(extractRootId)). and class ExtractRuleTreeDataloader will be called if find nothing in cache by extractRootId.

public class ExtractRuleTreeDataloader implements IDataLoader {
public static final Logger LOG = LoggerFactory.getLogger(ExtractRuleTreeDataloader.class);
private int                ruleTreeId;

public ExtractRuleTreeDataloader(int ruleTreeId) {
    super();
    this.ruleTreeId = ruleTreeId;
}

@Override
public Object load() {
    List<Record> ruleTreeList = Db.find("SELECT * FROM cfg_extract_fule WHERE root_id=?", ruleTreeId);
    TreeHelper<ExtractRuleNode> treeHelper = ExtractUtil.batchRecordConvertTree(ruleTreeList);//convert List<Record> to and tree
    if (treeHelper.isValidTree()) {
        return treeHelper.getRoot();
    } else {
        LOG.warn("rule tree id :{} is an error tree #end#", ruleTreeId);
        return null;
    }
}

As I said before, I use JFinal ORM.The Db.find method code is

public List<Record> find(String sql, Object... paras) {
    Connection conn = null;
    try {
        conn = config.getConnection();
        return find(config, conn, sql, paras);
    } catch (Exception e) {
        throw new ActiveRecordException(e);
    } finally {
        config.close(conn);
    }
}

and the config close method code is

public final void close(Connection conn) {
    if (threadLocal.get() == null)      // in transaction if conn in threadlocal
        if (conn != null)
            try {conn.close();} catch (SQLException e) {throw new ActiveRecordException(e);}
}

There is no transaction in my code,so I am pretty sure the conn.close() will be called every time.

**********************more description on 07-28****************

First, I use Ehcache to store the taskConfigs in the memory. And the taskConfigs almost never change, so I want store them in the memory eternally and store them to disk if the memory can not store them all.

I use MAT to find out the GC Roots of NonRegisteringDriver, and the result is show in the following picture. The Gc Roots of NonRegisteringDriver

But I still don't understand why the default behavior of Ehcache lead memory leak.The taskConfig is a class extends the Model class.

public class TaskConfig extends Model<TaskConfig> {
    private static final long    serialVersionUID = 5000070716569861947L;
    public static TaskConfig DAO              = new TaskConfig();

}

and the source code of Model is in this page(github.com/jfinal/jfinal/blob/jfinal-2.0/src/com/jfinal/plugin/activerecord/Model.java). And I can't find any reference (either directly or indirectly) to the connection object as @Jeremiah guessing.

Then I read the source code of NonRegisteringDriver, and don't understand why the map field connectionPhantomRefs of NonRegisteringDriver holds more than 5000 entrys of <ConnectionPhantomReference, ConnectionPhantomReference>,but find no ConnectionImpl in the queue field refQueue of NonRegisteringDriver. Because I see the cleanup code in class AbandonedConnectionCleanupThread which means it will move the ref in the NonRegisteringDriver.connectionPhantomRefs while getting abandoned connection ref from NonRegisteringDriver.refQueue.

@Override
public void run() {
    threadRef = this;
    while (running) {
        try {
            Reference<? extends ConnectionImpl> ref = NonRegisteringDriver.refQueue.remove(100);
            if (ref != null) {
                try {
                    ((ConnectionPhantomReference) ref).cleanup();
                } finally {
                    NonRegisteringDriver.connectionPhantomRefs.remove(ref);
                }
            }

        } catch (Exception ex) {
            // no where to really log this if we're static
        }
    }
}

Appreciate the help offered by @Jeremiah !

Upvotes: 0

Views: 2353

Answers (2)

abu
abu

Reputation: 43

The root cause of the java program is that the Linux OS runs out of memory and the OOM Killer kills the progresses. I found the log in /var/log/messages like following.

Aug  3 07:24:03 iZ233tupyzzZ kernel: Out of memory: Kill process 17308 (java) score 890 or sacrifice child
Aug  3 07:24:03 iZ233tupyzzZ kernel: Killed process 17308, UID 0, (java) total-vm:2925160kB, anon-rss:1764648kB, file-rss:248kB
Aug  3 07:24:03 iZ233tupyzzZ kernel: Thread (pooled) invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Aug  3 07:24:03 iZ233tupyzzZ kernel: Thread (pooled) cpuset=/ mems_allowed=0
Aug  3 07:24:03 iZ233tupyzzZ kernel: Pid: 6721, comm: Thread (pooled) Not tainted 2.6.32-431.23.3.el6.x86_64 #1

I also find the default value of maxIdleTime is 20 seconds in the C3p0Plugin which is a c3p0 plugin in JFinal, So I think this is why the Object NonRegisteringDriver occupies 280+ M bytes that shown in the MAT report. So I set the maxIdleTime to 3600 seconds and the object NonRegisteringDriver is no longer suspicious in the MAT report.

And I reset the jvm argements to -Xms512m -Xmx512m. And the java program already has been running pretty well for several days. The Full Gc will be called as expected when the Old Gen is full.

Upvotes: 0

Jeremiah
Jeremiah

Reputation: 1145

From the comments above I'm almost certain your memory leak is actually memory usage from EhCache. The ConcurrentHashMap you're seeing is the one backing the MemoryStore, and I'm guessing that the taskConfig holds a reference (either directly or indirectly) to the connection object, which is why it's showing in your stack.

Having eternal="true" in the default cache makes it so the inserted objects are never allowed to expire. Even without that, the timeToLive and timeToIdle values default to an infinite lifetime!

Combine that with the default behavior of Ehcache when retrieving elements is to copy them (last I checked), through serialization! You're just stacking new Object references up each time the taskConfig is extracted and put back into the ehcache.

The best way to test this (in my opinion) is to change your default cache configuration. Change eternal to false, and implement a timeToIdle value. timeToIdle is a time (in seconds) that a value may exist in the cache without being accessed.

 <ehcache> <diskStore path="java.io.tmpdir"/> <defaultCache maxElementsInMemory="10000" eternal="false" timeToIdle="120"  overflowToDisk="true" diskPersistent="false" diskExpiryThreadIntervalSeconds="120"/>

If that works, then you may want to look into further tweaking your ehcache configuration settings, or providing a more customized cache reference other than default for your class.

There are multiple performance considerations when tweaking the ehcache. I'm sure that there is a better configuration for your business model. The Ehcache documentation is good, but I found the site to be a bit scattered when I was trying to figure it out. I've listed some links that I found useful below.

http://www.ehcache.org/documentation/2.8/configuration/cache-size.html

http://www.ehcache.org/documentation/2.8/configuration/configuration.html

http://www.ehcache.org/documentation/2.8/apis/cache-eviction-algorithms.html#provided-memorystore-eviction-algorithms

Good luck!


To test your memory leak try the following:

  1. Insert a TaskConfig into ehcache
  2. Immediately retrieve it back out of the cache.
  3. output the value of TaskConfig1.equals(TaskConfig2);

If it returns false, that is your memory leak. Override equals and hash in your TaskConfig Object and rerun the test.

Upvotes: 1

Related Questions