Cong Wang
Cong Wang

Reputation: 93

possible bug in Flow

Possible Bug:

When child B failed, handleStartChildWorkflowExecutionFailed method in GenericWorkflowClientImpl removed "OpenRequestInfo" from the scheduledExternalWorkflows map based on workflow id as a key. Since 5 child workflows have the same workflow id. So the map became empty once child B initialization failed. Therefore, the parent workflow cannot complete due to 4 child workflows requests are never able to close properly in handle* method.

Line 335 shows handleStartChildWorkflowExecutionFailed removes failed entry.

https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-swf-libraries/src/main/java/com/amazonaws/services/simpleworkflow/flow/worker/GenericWorkflowClientImpl.java#L335

Upvotes: 0

Views: 248

Answers (2)

Cong Wang
Cong Wang

Reputation: 93

@Override
protected ExternalTaskCancellationHandler doExecute(final     ExternalTaskCompletionHandle handle) throws Throwable {
    context.setCompletionHandle(handle);
    String workflowId = attributes.getWorkflowId();
    if (scheduledExternalWorkflows.containsKey(workflowId)) {
        WorkflowExecution workflowExecution = new WorkflowExecution();
        workflowExecution.setWorkflowId(workflowId);
        WorkflowType workflowType = attributes.getWorkflowType();

        long fakeEventId = -1;
        handle.fail(new StartChildWorkflowFailedException(fakeEventId, workflowExecution, workflowType,       StartChildWorkflowExecutionFailedCause.WORKFLOW_ALREADY_RUNNING.toString())    );

        return new ChildWorkflowCancellationHandler(workflowId, handle);
    }
        decisions.startChildWorkflowExecution(attributes);
        scheduledExternalWorkflows.put(workflowId, context);
        return new ChildWorkflowCancellationHandler(workflowId, handle);
    }
}

}

Updated the line 162 in this class:

https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-swf-libraries/src/main/java/com/amazonaws/services/simpleworkflow/flow/worker/GenericWorkflowClientImpl.java#L163

Upvotes: 0

Maxim Fateev
Maxim Fateev

Reputation: 6870

Update2: The issue is still not fixed :(.

Update: It might be that the bug was fixed by commit 0a183e02b29b06e9324b740af40daff9193c9290. Please verify.

It looks like a bug in the DecisionsHelper. It assumes that DecisionId is never reused as it is never removed from the decisions map. DecisionId is never reused for activities and lambdas, but as your discovered it is not always true for the child workflows :(. The workaround is to not reuse the child workflow id.

In your case I don't see a reason trying to schedule a child workflow with the same id as parent workflow has complete information about the state of the child workflow and can easily avoid it. BTW Have you considered using CronInvocationSchedule with AsyncScheduledExecutor as in CronWithRety sample?

But this bug is nasty for cases when child workflow with the same id can be created by multiple instances of a parent workflow.

Upvotes: 1

Related Questions