Reputation: 21
I'm facing a strange behaviour with some parameters in weblogic. I have a J2EE batch which is executed during more than 10 minutes in a weblogic server which cause an exception like
com.ibm.jbatch.container.exception.BatchContainerRuntimeException: java.lang.InterruptedException
After some investigation, I found that the property MaxStuckThreadTime is set to 600 seconds (default value) and the property StuckThreadCount is set to 25 (was 0 in the past without any issue). If I understand well, this means, the server should fail if and only if at least 25 threads are busy since more than 600seconds. But I have maximum 10 threads running at the same time on the server. I made some test on my dev environement and as soon as I have one thread stuck (busy during 10 minutes, the interruped exception is launched), is-it the expected behaviour?
I don't have the right to modify those value on production. So, any idea is welcome to by pass this kind of error.
In the documentation, I found :
StuckThreadCount = The number of stuck threads after which the server is transitioned into FAILED state.
MaxStuckThreadTime = Sets the value of the MaxStuckThreadTime attribute.
So, in my point of view, the interupted excpetion, should only appears if the 2 conditions are field-in, but i have the impression that only one stuck thread is enough to interupt the batch. Am-i correct if I say that the MaxStuckThreadTime is only taken into account if the StuckThreadCount is different than 0?
Thanks in advance for your help
edit :
I tried to implement the proposal here under but until now, without success. So, in my weblogic-ejb-jar.xml, I've added the following code :
<work-manager>
<name>BatchWorkManager</name>
<ignore-stuck-threads>true</ignore-stuck-threads>
</work-manager>
<managed-executor-service>
<name>batch-job-executor</name>
<dispatch-policy>BatchWorkManager</dispatch-policy>
<long-running-priority>10</long-running-priority>
</managed-executor-service>
and in my batch, I added
@Resource(name = "BatchWorkManager")
WorkManager myMW;
and the call to my batch like this
@Override public String process() throws Exception {
myWM.schedule(new MyWork("MyBatchName"));
return BatchStatus.COMPLETED.toString();
}
After a few minutes (defined in the MaxStuckThreadTime parameter), the job is put on status failed. If I debug the code, I see the value of the workmanager :
stuckThreadActions = null name = "NO STUCK THREAD ACTIONS !" stuckThreads = {BitSet@36226} "{}"
It seems, the workmanager is correctly setup (NO STUCK THREAD ACTIONS ! is what I want). So, I still don't understand, why the batch is failing ... Any help is welcome.
For information, the statcktrace I receive :
###<Apr 21, 2022, 12:40:00,793 PM CEST> <com.ibm.jbatch.container.impl.BatchletStepControllerImpl> <[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <33ef2b10-13cc-45be-bf47-e06daf40042c-0000003b> <1650537600793> <[severity-value: 16] [rid: 0:1] [partition-id: 0] [partition-name: DOMAIN] > <Caught exception executing step: com.ibm.jbatch.container.exception.BatchContainerRuntimeException: java.lang.InterruptedException at com.ibm.jbatch.container.impl.PartitionedStepControllerImpl.executeAndWaitForCompletion(PartitionedStepControllerImpl.java:407) at com.ibm.jbatch.container.impl.PartitionedStepControllerImpl.invokeCoreStep(PartitionedStepControllerImpl.java:297) at com.ibm.jbatch.container.impl.BaseStepControllerImpl.execute(BaseStepControllerImpl.java:144) at com.ibm.jbatch.container.impl.ExecutionTransitioner.doExecutionLoop(ExecutionTransitioner.java:112) at com.ibm.jbatch.container.impl.JobThreadRootControllerImpl.originateExecutionOnThread(JobThreadRootControllerImpl.java:110) at com.ibm.jbatch.container.util.BatchWorkUnit.run(BatchWorkUnit.java:80) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at weblogic.work.concurrent.TaskWrapper.call(TaskWrapper.java:151) at weblogic.work.concurrent.future.AbstractFutureImpl.runTask(AbstractFutureImpl.java:391) at weblogic.work.concurrent.future.AbstractFutureImpl.doRun(AbstractFutureImpl.java:436) at weblogic.work.concurrent.future.ManagedFutureImpl.run(ManagedFutureImpl.java:28) at weblogic.invocation.ComponentInvocationContextManager._runAs(ComponentInvocationContextManager.java:348) at weblogic.invocation.ComponentInvocationContextManager.runAs(ComponentInvocationContextManager.java:333) at weblogic.work.LivePartitionUtility.doRunWorkUnderContext(LivePartitionUtility.java:54) at weblogic.work.PartitionUtility.runWorkUnderContext(PartitionUtility.java:41) at weblogic.work.SelfTuningWorkManagerImpl.runWorkUnderContext(SelfTuningWorkManagerImpl.java:640) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:406) at weblogic.work.ExecuteThread.run(ExecuteThread.java:346) Caused by: java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at com.ibm.jbatch.container.impl.PartitionedStepControllerImpl.executeAndWaitForCompletion(PartitionedStepControllerImpl.java:402) ... 17 more
Upvotes: 0
Views: 496
Reputation: 2133
You could configure a new work manager for running the batch job and configure stuck threads to be ignored, or launch the batch job as a long running request.
A work manager can be configured globally via the weblogic console, or locally for each deployed application. To define a work manager in an application, you can configure it in the weblogic.xml (or equivalent for ear files) packaged up with your deployment. For example, i have this in my weblogic.xml file to define a work manager that ignores stuck threads...
<?xml version="1.0" encoding="UTF-8"?>
<weblogic-web-app xmlns="http://xmlns.oracle.com/weblogic/weblogic-web-app" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.oracle.com/weblogic/weblogic-web-app http://xmlns.oracle.com/weblogic/weblogic-web-app/1.4/weblogic-web-app.xsd">
...
<work-manager>
<name>batch-job-wm</name>
<max-threads-constraint>
<name>batch-job-max-threads</name>
<count>10</count>
</max-threads-constraint>
<ignore-stuck-threads>true</ignore-stuck-threads>
</work-manager>
<managed-executor-service>
<name>batch-job-executor</name>
<dispatch-policy>batch-job-wm</dispatch-policy>
<long-running-priority>10</long-running-priority>
<max-concurrent-long-running-requests>10</max-concurrent-long-running-requests>
</managed-executor-service>
<resource-env-description>
<resource-env-ref-name>concurrent/batch-job-executor</resource-env-ref-name>
<resource-link>batch-job-executor</resource-link>
</resource-env-description>
...
</weblogic-web-app>
I reference that managed-executor-service in my web.xml...
<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns="http://java.sun.com/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd" version="3.0">
...
<resource-env-ref>
<resource-env-ref-name>concurrent/batch-job-executor</resource-env-ref-name>
<resource-env-ref-type>javax.enterprise.concurrent.ManagedExecutorService</resource-env-ref-type>
</resource-env-ref>
</web-app>
In my web application, I can then access that task executor as follows...
@Configuration
public class ResourceConfig {
@Bean
public TaskExecutor batchTaskExecutor() {
DefaultManagedTaskExecutor taskExecutor = new DefaultManagedTaskExecutor();
taskExecutor.setJndiName("java:comp/env/concurrent/batch-job-executor");
return taskExecutor;
}
}
When launching a batch job using that work manager, any stuck threads are ignored by weblogic and the servers show as healthy even for long running tasks.
An enhancement to this is to have the batch job launched as a long running task
. I think this will cause weblogic to create a new thread for the task instead of taking a thread from the work manager thread pool. Also weblogic won't consider a thread assigned to a long running task as being stuck.
To launch a long running task, you need to set the LONGRUNNING_HINT to true in the ManagedTask that is launched. For more details see the following...
https://docs.oracle.com/javaee/7/api/javax/enterprise/concurrent/ManagedTask.html#LONGRUNNING_HINT
https://docs.oracle.com/javaee/7/api/javax/enterprise/concurrent/ManagedExecutorService.html
Upvotes: 0