Reputation: 43
I have been working on a java application which crawls page from Internet with http-client(version4.3.3). It uses one fixedThreadPool with 5 threads,each is a loop thread .The pseudocode is following.
public class Spiderling extends Runnable{
@Override
public void run() {
while (true) {
T task = null;
try {
task = scheduler.poll();
if (task != null) {
if Ehcache contains task's config
taskConfig = Ehcache.getConfig;
else{
taskConfig = Query task config from db;//close the conn every time
put taskConfig into Ehcache
}
spider(task,taskConfig);
}
} catch (Exception e) {
e.printStackTrace();
}
}
LOG.error("spiderling is DEAD");
}
}
I am running it with following arguments -Duser.timezone=GMT+8 -server -Xms1536m -Xmx1536m -Xloggc:/home/datalord/logs/gc-2016-07-23-10-28-24.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC
on a server(2 cpus,2G memory) and it crashes pretty regular about once in two or three days with no OutOfMemoryError and no JVM error log.
Here is my analysis;
jmap -dump:format=b,file=soldier.bin
and using the Eclipse MAT to analyze the dump file.Here is the problem suspect which object occupies 280+ M bytes.The class "com.mysql.jdbc.NonRegisteringDriver", loaded by "sun.misc.Launcher$AppClassLoader @ 0xa0018490", occupies 281,118,144 (68.91%) bytes. The memory is accumulated in one instance of "java.util.concurrent.ConcurrentHashMap$Segment[]" loaded by "".
Keywords com.mysql.jdbc.NonRegisteringDriver java.util.concurrent.ConcurrentHashMap$Segment[] sun.misc.Launcher$AppClassLoader @ 0xa0018490.
I use c3p0-0.9.1.2 as mysql connection pool and mysql-connector-java-5.1.34 as jdbc connector and Ehcache-2.6.10 as memory cache.I have see all posts about 'com.mysql.jdbc.NonregisteringDriver memory leak' and still get no clue.
This problem has driven me crazy for several days, any advice or help will be appreciated!
**********************Supplementary description on 07-24****************
I use a JAVA WEB + ORM Framework called JFinal(github.com/jfinal/jfinal) which is open in github。 Here are some core code for further description about the problem.
/**
* CacheKit. Useful tool box for EhCache.
*
*/
public class CacheKit {
private static CacheManager cacheManager;
private static final Logger log = Logger.getLogger(CacheKit.class);
static void init(CacheManager cacheManager) {
CacheKit.cacheManager = cacheManager;
}
public static CacheManager getCacheManager() {
return cacheManager;
}
static Cache getOrAddCache(String cacheName) {
Cache cache = cacheManager.getCache(cacheName);
if (cache == null) {
synchronized(cacheManager) {
cache = cacheManager.getCache(cacheName);
if (cache == null) {
log.warn("Could not find cache config [" + cacheName + "], using default.");
cacheManager.addCacheIfAbsent(cacheName);
cache = cacheManager.getCache(cacheName);
log.debug("Cache [" + cacheName + "] started.");
}
}
}
return cache;
}
public static void put(String cacheName, Object key, Object value) {
getOrAddCache(cacheName).put(new Element(key, value));
}
@SuppressWarnings("unchecked")
public static <T> T get(String cacheName, Object key) {
Element element = getOrAddCache(cacheName).get(key);
return element != null ? (T)element.getObjectValue() : null;
}
@SuppressWarnings("rawtypes")
public static List getKeys(String cacheName) {
return getOrAddCache(cacheName).getKeys();
}
public static void remove(String cacheName, Object key) {
getOrAddCache(cacheName).remove(key);
}
public static void removeAll(String cacheName) {
getOrAddCache(cacheName).removeAll();
}
@SuppressWarnings("unchecked")
public static <T> T get(String cacheName, Object key, IDataLoader dataLoader) {
Object data = get(cacheName, key);
if (data == null) {
data = dataLoader.load();
put(cacheName, key, data);
}
return (T)data;
}
@SuppressWarnings("unchecked")
public static <T> T get(String cacheName, Object key, Class<? extends IDataLoader> dataLoaderClass) {
Object data = get(cacheName, key);
if (data == null) {
try {
IDataLoader dataLoader = dataLoaderClass.newInstance();
data = dataLoader.load();
put(cacheName, key, data);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
return (T)data;
}
}
I use CacheKit like CacheKit.get("cfg_extract_rule_tree", extractRootId, new ExtractRuleTreeDataloader(extractRootId))
. and class ExtractRuleTreeDataloader
will be called if find nothing in cache by extractRootId
.
public class ExtractRuleTreeDataloader implements IDataLoader {
public static final Logger LOG = LoggerFactory.getLogger(ExtractRuleTreeDataloader.class);
private int ruleTreeId;
public ExtractRuleTreeDataloader(int ruleTreeId) {
super();
this.ruleTreeId = ruleTreeId;
}
@Override
public Object load() {
List<Record> ruleTreeList = Db.find("SELECT * FROM cfg_extract_fule WHERE root_id=?", ruleTreeId);
TreeHelper<ExtractRuleNode> treeHelper = ExtractUtil.batchRecordConvertTree(ruleTreeList);//convert List<Record> to and tree
if (treeHelper.isValidTree()) {
return treeHelper.getRoot();
} else {
LOG.warn("rule tree id :{} is an error tree #end#", ruleTreeId);
return null;
}
}
As I said before, I use JFinal ORM.The Db.find
method code is
public List<Record> find(String sql, Object... paras) {
Connection conn = null;
try {
conn = config.getConnection();
return find(config, conn, sql, paras);
} catch (Exception e) {
throw new ActiveRecordException(e);
} finally {
config.close(conn);
}
}
and the config close
method code is
public final void close(Connection conn) {
if (threadLocal.get() == null) // in transaction if conn in threadlocal
if (conn != null)
try {conn.close();} catch (SQLException e) {throw new ActiveRecordException(e);}
}
There is no transaction in my code,so I am pretty sure the conn.close() will be called every time.
**********************more description on 07-28****************
First, I use Ehcache to store the taskConfigs in the memory. And the taskConfigs almost never change, so I want store them in the memory eternally and store them to disk if the memory can not store them all.
I use MAT to find out the GC Roots of NonRegisteringDriver, and the result is show in the following picture. The Gc Roots of NonRegisteringDriver
But I still don't understand why the default behavior of Ehcache lead memory leak.The taskConfig is a class extends the Model class.
public class TaskConfig extends Model<TaskConfig> {
private static final long serialVersionUID = 5000070716569861947L;
public static TaskConfig DAO = new TaskConfig();
}
and the source code of Model is in this page(github.com/jfinal/jfinal/blob/jfinal-2.0/src/com/jfinal/plugin/activerecord/Model.java). And I can't find any reference (either directly or indirectly) to the connection object as @Jeremiah guessing.
Then I read the source code of NonRegisteringDriver, and don't understand why the map field connectionPhantomRefs
of NonRegisteringDriver
holds more than 5000 entrys of <ConnectionPhantomReference, ConnectionPhantomReference>
,but find no ConnectionImpl
in the queue field refQueue
of NonRegisteringDriver
. Because I see the cleanup code in class AbandonedConnectionCleanupThread
which means it will move the ref
in the NonRegisteringDriver.connectionPhantomRefs
while getting abandoned connection ref
from NonRegisteringDriver.refQueue
.
@Override
public void run() {
threadRef = this;
while (running) {
try {
Reference<? extends ConnectionImpl> ref = NonRegisteringDriver.refQueue.remove(100);
if (ref != null) {
try {
((ConnectionPhantomReference) ref).cleanup();
} finally {
NonRegisteringDriver.connectionPhantomRefs.remove(ref);
}
}
} catch (Exception ex) {
// no where to really log this if we're static
}
}
}
Appreciate the help offered by @Jeremiah !
Upvotes: 0
Views: 2353
Reputation: 43
The root cause of the java program is that the Linux OS runs out of memory and the OOM Killer kills the progresses. I found the log in /var/log/messages like following.
Aug 3 07:24:03 iZ233tupyzzZ kernel: Out of memory: Kill process 17308 (java) score 890 or sacrifice child
Aug 3 07:24:03 iZ233tupyzzZ kernel: Killed process 17308, UID 0, (java) total-vm:2925160kB, anon-rss:1764648kB, file-rss:248kB
Aug 3 07:24:03 iZ233tupyzzZ kernel: Thread (pooled) invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Aug 3 07:24:03 iZ233tupyzzZ kernel: Thread (pooled) cpuset=/ mems_allowed=0
Aug 3 07:24:03 iZ233tupyzzZ kernel: Pid: 6721, comm: Thread (pooled) Not tainted 2.6.32-431.23.3.el6.x86_64 #1
I also find the default value of maxIdleTime
is 20 seconds in the C3p0Plugin
which is a c3p0 plugin in JFinal, So I think this is why the Object NonRegisteringDriver
occupies 280+ M bytes that shown in the MAT report. So I set the maxIdleTime
to 3600 seconds and the object NonRegisteringDriver
is no longer suspicious in the MAT report.
And I reset the jvm argements to -Xms512m -Xmx512m
. And the java program already has been running pretty well for several days. The Full Gc will be called as expected when the Old Gen is full.
Upvotes: 0
Reputation: 1145
From the comments above I'm almost certain your memory leak is actually memory usage from EhCache. The ConcurrentHashMap you're seeing is the one backing the MemoryStore, and I'm guessing that the taskConfig holds a reference (either directly or indirectly) to the connection object, which is why it's showing in your stack.
Having eternal="true" in the default cache makes it so the inserted objects are never allowed to expire. Even without that, the timeToLive and timeToIdle values default to an infinite lifetime!
Combine that with the default behavior of Ehcache when retrieving elements is to copy them (last I checked), through serialization! You're just stacking new Object references up each time the taskConfig is extracted and put back into the ehcache.
The best way to test this (in my opinion) is to change your default cache configuration. Change eternal to false, and implement a timeToIdle value. timeToIdle is a time (in seconds) that a value may exist in the cache without being accessed.
<ehcache> <diskStore path="java.io.tmpdir"/> <defaultCache maxElementsInMemory="10000" eternal="false" timeToIdle="120" overflowToDisk="true" diskPersistent="false" diskExpiryThreadIntervalSeconds="120"/>
If that works, then you may want to look into further tweaking your ehcache configuration settings, or providing a more customized cache reference other than default for your class.
There are multiple performance considerations when tweaking the ehcache. I'm sure that there is a better configuration for your business model. The Ehcache documentation is good, but I found the site to be a bit scattered when I was trying to figure it out. I've listed some links that I found useful below.
http://www.ehcache.org/documentation/2.8/configuration/cache-size.html
http://www.ehcache.org/documentation/2.8/configuration/configuration.html
Good luck!
To test your memory leak try the following:
If it returns false, that is your memory leak. Override
equals
andhash
in your TaskConfig Object and rerun the test.
Upvotes: 1