DR_S
DR_S

Reputation: 102

Resource Allocation for Incremental Pipelines

There are times when an incremental pipeline in Palantir Foundry has to be built as a snapshot. If the data size is large, the resources to run the build are increased to reduce run time and then the configuration is removed after first snapshot run. Is there a way to set conditional configuration? Like if pipeline is running on Incremental Mode, use default configuration of resource allocation and if not the specified set of resources.

Example: If pipeline runs as snapshot transaction, below configuration has to be applied

@configure(profile=["NUM_EXECUTORS_8", "EXECUTOR_MEMORY_MEDIUM", "DRIVER_MEMORY_MEDIUM"]) 

If incremental, then the default one.

Upvotes: 1

Views: 228

Answers (1)

fmsf
fmsf

Reputation: 37137

The @configure and @incremental are set during the CI execution, while the actual code inside the function annotated by @transform_df or `@transform happens at build time.

This means that you can't programatically switch between them after the CI has passed. What you can do however is have a constant or configuration within your repo, and switch at code level whenever you want to switch these. Please make sure you understand how semantic versioning works before attempting this I.e.:

IS_INCREMENTAL = true
SEMANTIC_VERSION=1

def mytransform(input1, input2,...)
   return input1.join(input2, "foo", left)


if IS_INCREMENTAL:
   @incremental(semantic_version=SEMANTIC_VERSION)
   @transform_df(
     Output("foo"),
     input1=Input("bar"),
     input2=Input("foobar"))
   def compute(input1, input2):
      return mytransform(input1, input2)
else:
   @configure(profile=["NUM_EXECUTORS_8", "EXECUTOR_MEMORY_MEDIUM", "DRIVER_MEMORY_MEDIUM"]) 
   @transform_df(
     Output("foo"),
     input1=Input("bar"),
     input2=Input("foobar"))
   def compute(input1, input2):
      return mytransform(input1, input2)

Upvotes: 1

Related Questions