sumitkanoje
sumitkanoje

Reputation: 1245

What is the difference between transform & transform_df in Palantir Foundry?

Can someone explain why we need transform & transform_df methods separately?

Upvotes: 11

Views: 2037

Answers (2)

Grigory Sharkov
Grigory Sharkov

Reputation: 140

One addition to the answer of @Adil B. @transform_df can handle only one output, whereas in @transform can have multiple, but you are in chagre of writing the output:

from pyspark.sql import DataFrame
from transforms.api import transform_df, Input, Output

@transform_df(
    Output("some_foundry_id"),
    input_dataset=Input("another_foundy_id"),
)
def compute(input_dataset: DataFrame) -> DataFrame:
    return input_dataset

the dataframe you return here will be saved by palantir in the output

from pyspark.sql import DataFrame
from transforms.api import transform, Input, Output

@transform(
    input_1=Input("..."),
    output_1=Output("..."),
    output_2=Output("..."),
)
def compute(input_1: Input, output_1: Output, output_2: Output) -> None:
    output_1.write_dataframe(input_1.dataframe())
    output_2.write_dataframe(input_1.dataframe())

Upvotes: 3

Adil B
Adil B

Reputation: 16856

There's a small difference between the @transform and @transform_df decorators in Code Repositories:

  • @transform_df operates exclusively on DataFrame objects.
  • @transform operates on transforms.api.TransformInput and transforms.api.TransformOutput objects rather than DataFrames.

If your data transformation depends exclusively on DataFrame objects, you can use the @transform_df() decorator. This decorator injects DataFrame objects and expects the compute function to return a DataFrame.

Alternatively, you can use the more general @transform() decorator and explicitly call the dataframe() method to access a DataFrame containing your input dataset.

Upvotes: 10

Related Questions