Prakash Dutta
Prakash Dutta

Reputation: 91

how to use --selector & --defer in getdbt. Please share some examples

I am using getdbt on redshift for data analytics operation. Can anyone please suggest, how to use --selector & --defer with "dbt run" commands. What is the syntax ? What is the use of selectors.yml file? Please share some examples.

Thanks

Upvotes: 0

Views: 1603

Answers (1)

sgdata
sgdata

Reputation: 2763

My interpretation of defer is a way to utilize the dbt cli to work with unbuilt or differential versions of the current & futures state defined versions of a model.

Example of why you may want to interact with that here: #2740 - Automating Non Regression Test


selectors being a relatively new feature, I also haven't seen much documentation to back this up but it is effectively a naming convention for a set of logical criteria (more than 1 tag, multiple directories, etc.)

I'd recommend this article in general for understanding the build path generation of a typical dbt run: How we made dbt runs 30% faster

From there, you can imagine that within a large project, there are huge interconnecting chains for each raw -> analytics ready transformation pipeline that you have.

We'll use Gitlab's open dbt project as an example.

Gitlab doesn't currently use selectors but they do make use of tags. So they could build up a selectors.yml file using logical definitions like:

selectors.yml

selectors:
  - name: sales_funnel
    definition: 
        tag: salesforce
        tag: sales_funnel
  - name: arr
    description: builds all arr models to current state + all upstream dependencies (zoho, zuora subscriptions, etc.)
    default: true
    definition: 
         tag: zuora_revenue
         tag: arr
  - name: month_end_process
    description: builds reporting models about customer segments based on subscription activity for latest closed month
    definition:
      - union:
          - method: fqn
            value: rpt_available_to_renew_month_end
            greedy: eager  # default: will include all tests that touch selected model
          - method: fqn
            value: rpt_possible_to_churn_month_end
            greedy: eager

Full list of valid selector definitions here: https://docs.getdbt.com/reference/node-selection/yaml-selectors#default

What that gives them the ability to do is on a cron job, via airflow, or some other orchestrator simply execute:

dbt run --selector month_end_process --full-refresh

And have confidence that the logical selection of models to run for that process is 100% accurately reproduced instead of another more fallible approach like assuming that all the models needed are in a single directory:

dbt run --models marts.finance.restricted_safe.reports --full-refresh


Architecturally, you likely won't need selectors until you get to the level of having multiple layers of tags and / or multiple layers of use-case directories to be mindful of within a single run.

Example: tags for the models' function, tags for the sources, tags for the bi/analyst consumers, tags for the materialization schedule, etc.

Upvotes: 1

Related Questions