How can I version Cadence workflows?

Question

Cadence workflows are required to be deterministic, which means that a workflow is expected to produce the exact same results if it’s executed with the same input parameters.

When I learned the requirement above as a new Cadence user, I wondered how I can maintain workflows in the long run when determinism-breaking changes are required.

An example scenario is where you have a workflow that executes Activity1 and Activity2 consecutively, and then you need to change the order of these activities so that the workflow executes Activity2 before Activtiy1. There are many other ways to make determinism-breaking changes like this, and I wanted to understand how to handle those changes.

This is especially important in cases where the workflows can run for long durations such as days, weeks, or even months!

Emrah Seker · Accepted Answer

Apparently, this is probably one of the most common questions a new Cadence developer asks. Cadence workflows are required to be deterministic algorithms. If a workflow algorithm isn’t deterministic, Cadence workers will be at the risk of hitting nondeterministic workflow errors when they try replaying the history (ie. during worker failure recovery).

There are two ways to solve this problem:

Creating a brand-new workflow: This is the most naive approach for versioning workflows. The approach is as simple as it sounds: anytime you need to make a change to your workflow’s algorithm, you make a copy of your original workflow and edit it the way you want, give it a new name like MyWorkflow_V2 and start using for all new instances going forward. If your workflow is not very long-living, your existing workflows will “drain out” at some point and you’ll be able to delete the old version altogether. On the other hand, this approach can turn into a maintenance nightmare very quickly for obvious reasons.
Using the GetVersion() API to fork workflow logic: Cadence client has a function named GetVersion, which tells you what version of the workflow is currently running. You can use the information returned by this function to decide which version of your workflow algorithm needs to be used. In other words, your workflow has both the old and new algorithms running side-by-side and you are able to pick the right version for your workflow instances to ensure that they run deterministically.

Below is an example of the GetVersion() based approach. Let’s assume you want to change the following line in your workflow:

err = workflow.ExecuteActivity(ctx, foo).Get(ctx, nil)

to

err = workflow.ExecuteActivity(ctx, bar).Get(ctx, nil)

This is a breaking change since it runs the bar activity instead of foo. If you simply make that change without worrying about determinism, your workflows will fail to replay if they need to and they’ll be stuck with the nondeterministic workflow error. The correct way to make this change properly is updating the workflow as follows:

v :=  GetVersion(ctx, "fooChange", DefaultVersion, 1)
if v  == DefaultVersion {
   err = workflow.ExecuteActivity(ctx, foo).Get(ctx, nil)
} else {
   err = workflow.ExecuteActivity(ctx, bar).Get(ctx, nil)
}

The GetVersion function accepts 4 parameters:

ctx is the standard context object
“fooChange” is a human-readable ChangeID or the semantic change you are making in your workflow algorithm that breaks the determinism
DefaultVersion is a constant that simply means Version 0.In other words, the very first version. It’s passed as the minSupportedVersion parameter to the GetVersion function
1 is the maxSupportedVersion that can be handled by your current workflow code. In this case, our algorithm can support workflow versions from DefaultVersion to Version 1 (inclusively).

When a new instance of this workflow reaches the GetVersion() call above for the first time, the function will return the maxSupportedVersion parameter so that you can run the latest version of your workflow algorithm. In the meantime, it’ll also record that version number in the workflow history (internally known as a Marker Event) so that it is remembered in the future. When replaying this workflow later on, Cadence client will keep returning the same version number even if you pass a different maxSupportedVersion parameter (ie. if your workflow has even more versions).

If the GetVersion call is encountered during a history replay and the history doesn’t have a marker event that was logged earlier, the function will return DefaultVersion, with the assumption that the “fooChange” had never existed in the context of this workflow instance.

In case you need to make one more breaking change in the same step of your workflow, you simply need to change the code above like this:

v :=  GetVersion(ctx, "fooChange", DefaultVersion, 2) // Note the new max version
if v  == DefaultVersion {
   err = workflow.ExecuteActivity(ctx, foo).Get(ctx, nil)
} else if v == 1 {
   err = workflow.ExecuteActivity(ctx, bar).Get(ctx, nil)
} else { // This is the Version 2 logic
   err = workflow.ExecuteActivity(ctx, baz).Get(ctx, nil)
}

When you are comfortable with dropping the support for the Version 0, you change the code above like this:

v :=  GetVersion(ctx, "fooChange", 1, 2) // DefaultVersion is no longer supported
if v == 1 {
   err = workflow.ExecuteActivity(ctx, bar).Get(ctx, nil)
} else { 
   err = workflow.ExecuteActivity(ctx, baz).Get(ctx, nil)
}

After this change, if your workflow code runs for an old workflow instance with the DefaultVersion version, Cadence client will raise an error and stop the execution.

Eventually, you’ll probably want to get rid of all previous versions and only support the latest version. One option is to simply get rid of the GetVersion call and the if statement altogether and simply have a single line of code that does the right thing. However, it’s actually a better idea to keep the GetVersion() call in there for two reasons:

GetVersion() gives you a better idea of what went wrong if your worker attempts to replay the history of an old workflow instance. Instead of investigating the root cause of a mysterious nondeterministic workflow error, you’ll know that the failure is caused by workflow versioning at this location.
If you need to make more breaking changes to the same step of your workflow algorithm, you’ll be able to reuse the same Change ID and continue following the same pattern as you did above.

Considering the two reasons mentioned above, you should be updating your workflow code like the following when it’s time to drop to support for all old versions:

GetVersion(ctx, "fooChange", 2, 2) // This acts like an assertion to give you a proper error
err = workflow.ExecuteActivity(ctx, baz).Get(ctx, nil)

How can I version Cadence workflows?

Answers (1)

Related Questions