Hrusilov
Hrusilov

Reputation: 654

Snapshotting of Contour results in Palantir Foundry

I'm preparing a Data quality Report based on couple Contour analyses and would like to do a daily snapshots of the reported incorrect records. Then I want to show these daily numbers as another report in the same dashboard to see the progress on the data quality.

The main questions for me are:

  1. can a Contour analyses be used as a source for data storing/computation
  2. how to store these numbers on a daily base (e.g. Fusion spreadsheet or Code workbook etc.)

Upvotes: 2

Views: 917

Answers (1)

Adil B
Adil B

Reputation: 16856

Here's one process for setting up daily snapshots of a dataset derived from a Contour analysis:

  1. Ensure that the Contour analysis results are saved as a dataset. Let's call this dataset mydataset: contour "save as dataset" button

  2. Create a Python Transform that performs daily snapshots and stores them in a dataset named mydataset_daily_snapshots:

    from transforms.api import transform_df, Input, Output
    from pyspark.sql import functions as F
    
    @transform_df(
        Output("/output/path/for/mydataset_daily_snapshots"),
        my_input=Input("/path/to/mydataset"),
    )
    def compute(my_input):
    
        out_df = my_input.withColumn('asof_timestamp', F.current_timestamp())  # the column 'asof_timestamp' will contain the snapshot for this row on the current date
        out_df = out_df.withColumn('primary_key', F.concat_ws('-', 'id', 'asof_timestamp'))  # this second line is optional -- create a primary key for this row, in case you want to create an Ontology object later on for use in Workshop.
    
        return out_df
    
  3. Create Build Schedules on both mydataset and mydataset_daily_snapshots that build the datasets daily (or as frequently as desired), so that mydataset_daily_snapshots will have data snapshots for each day. Ensure you check Force build so that snapshots will always be built, even if the source data has not changed: build schedule screenshot

You can then use the mydataset_daily_snapshots dataset within another Contour analysis to show the changes in the data over time in a Report, or create an Ontology object from it and use Workshop to show the change over time.

Something to keep in mind is that this dataset can potentially get very large very quickly -- any filtering to keep the dataset smaller (e.g. to limit snapshots to just the incorrect records or a sum of incorrect records for the day, for example) is a good idea.

Upvotes: 0

Related Questions