ZygD
ZygD

Reputation: 24356

Creating Expectations for all columns of certain type in Palantir Foundry

I use a expectations and Check to determine if a column of decimal type could be transformed into int or long type. A column could be safely transformed if it contains integers or decimals where decimal part only contains zeros. I check it using regex function rlike, as I couldn't find any other method using expectations.

The question is, can I do such check for all columns of type decimal without explicitly listing column names? df.columns is not yet available, as we are not yet inside the my_compute_function.

from transforms.api import transform_df, Input, Output, Check
from transforms import expectations as E


@transform_df(
    Output("ri.foundry.main.dataset.1e35801c-3d35-4e28-9945-006ec74c0fde"),
    inp=Input(
        "ri.foundry.main.dataset.79d9fa9c-4b61-488e-9a95-0db75fc39950",
        checks=Check(
            E.col('DSK').rlike('^(\d*(\.0+)?)|(0E-10)$'),
            'Decimal col DSK can be converted to int/long.',
            on_error='WARN'
        )
    ),
)
def my_compute_function(inp):
    return inp

Upvotes: 0

Views: 520

Answers (1)

psar
psar

Reputation: 11

You are right in that df.columns is not available before my_compute_function's scope is entered. There's also no way to add expectations from runtime, so hard-coding column names and generating expectations is necessary with this method.

To touch on the first part of your question - in an alternative approach you could attempt decimal -> int/long conversion in an upstream transform, store the result in a separate column and then use E.col('col_a').equals_col('converted_col_a').

This way you could simplify your Expectation condition while also implicitly handling the cases in which conversion would under/over-flow as DecimalType can hold arbitrarily large/small values (https://spark.apache.org/docs/latest/sql-ref-datatypes.html).

Upvotes: 1

Related Questions