Reputation: 377
Cadence workflows are required to be deterministic, which means that a workflow is expected to produce the exact same results if it’s executed with the same input parameters.
When I learned the requirement above as a new Cadence user, I wondered how I can maintain workflows in the long run when determinism-breaking changes are required.
An example scenario is where you have a workflow that executes Activity1 and Activity2 consecutively, and then you need to change the order of these activities so that the workflow executes Activity2 before Activtiy1. There are many other ways to make determinism-breaking changes like this, and I wanted to understand how to handle those changes.
This is especially important in cases where the workflows can run for long durations such as days, weeks, or even months!
Upvotes: 1
Views: 1053
Reputation: 377
Apparently, this is probably one of the most common questions a new Cadence developer asks. Cadence workflows are required to be deterministic algorithms. If a workflow algorithm isn’t deterministic, Cadence workers will be at the risk of hitting nondeterministic workflow errors when they try replaying the history (ie. during worker failure recovery).
There are two ways to solve this problem:
Below is an example of the GetVersion() based approach. Let’s assume you want to change the following line in your workflow:
err = workflow.ExecuteActivity(ctx, foo).Get(ctx, nil)
to
err = workflow.ExecuteActivity(ctx, bar).Get(ctx, nil)
This is a breaking change since it runs the bar activity instead of foo. If you simply make that change without worrying about determinism, your workflows will fail to replay if they need to and they’ll be stuck with the nondeterministic workflow error. The correct way to make this change properly is updating the workflow as follows:
v := GetVersion(ctx, "fooChange", DefaultVersion, 1)
if v == DefaultVersion {
err = workflow.ExecuteActivity(ctx, foo).Get(ctx, nil)
} else {
err = workflow.ExecuteActivity(ctx, bar).Get(ctx, nil)
}
The GetVersion function accepts 4 parameters:
When a new instance of this workflow reaches the GetVersion() call above for the first time, the function will return the maxSupportedVersion parameter so that you can run the latest version of your workflow algorithm. In the meantime, it’ll also record that version number in the workflow history (internally known as a Marker Event) so that it is remembered in the future. When replaying this workflow later on, Cadence client will keep returning the same version number even if you pass a different maxSupportedVersion parameter (ie. if your workflow has even more versions).
If the GetVersion call is encountered during a history replay and the history doesn’t have a marker event that was logged earlier, the function will return DefaultVersion, with the assumption that the “fooChange” had never existed in the context of this workflow instance.
In case you need to make one more breaking change in the same step of your workflow, you simply need to change the code above like this:
v := GetVersion(ctx, "fooChange", DefaultVersion, 2) // Note the new max version
if v == DefaultVersion {
err = workflow.ExecuteActivity(ctx, foo).Get(ctx, nil)
} else if v == 1 {
err = workflow.ExecuteActivity(ctx, bar).Get(ctx, nil)
} else { // This is the Version 2 logic
err = workflow.ExecuteActivity(ctx, baz).Get(ctx, nil)
}
When you are comfortable with dropping the support for the Version 0, you change the code above like this:
v := GetVersion(ctx, "fooChange", 1, 2) // DefaultVersion is no longer supported
if v == 1 {
err = workflow.ExecuteActivity(ctx, bar).Get(ctx, nil)
} else {
err = workflow.ExecuteActivity(ctx, baz).Get(ctx, nil)
}
After this change, if your workflow code runs for an old workflow instance with the DefaultVersion version, Cadence client will raise an error and stop the execution.
Eventually, you’ll probably want to get rid of all previous versions and only support the latest version. One option is to simply get rid of the GetVersion call and the if statement altogether and simply have a single line of code that does the right thing. However, it’s actually a better idea to keep the GetVersion() call in there for two reasons:
Considering the two reasons mentioned above, you should be updating your workflow code like the following when it’s time to drop to support for all old versions:
GetVersion(ctx, "fooChange", 2, 2) // This acts like an assertion to give you a proper error
err = workflow.ExecuteActivity(ctx, baz).Get(ctx, nil)
Upvotes: 1