Reputation: 779
I'm building an application where each of our clients needs their own data warehouse (for security, compliance, and maintainability reasons). For each client we pull in data from multiple third party integrations and then merge them into a unified view, which we use to perform analytics and report metrics for the data across those integrations. These transformations and all relevant schemas are the same for all clients. We would need this to scale to 1000s of clients.
From what I gather dbt is designed so each project corresponds with one warehouse. I see two options:
profiles.yml:
example_project:
target: dev
outputs:
dev:
type: redshift
...
client_1:
type: redshift
...
client_2:
type: redshift
...
...
profiles.yml:
client_1_project:
target: dev
outputs:
client_1:
type: redshift
...
client_2_project:
target: dev
outputs:
client_2:
type: redshift
...
Thoughts?
Upvotes: 2
Views: 1717
Reputation: 5815
I think you captured both options.
If you have a single database connection, and your client data is logically separated in that connection, I would definitely pick #2 (one package, many client projects) over #1. Some reasons:
dbt run
, and tear down the required files for each client project (I'm assuming you're not going to use dbt Cloud and have another orchestrator or compute environment that you control). It's easy to write YAML from Python with pyyaml
(each file is just a dict), and your individual projects probably only need separate profiles.yml
, sources.yml
, and (maybe) dbt_project.yml
files. I wouldn't check these generated files for each client into source control -- just check in the script and generate the files you need with each invocation of dbt.On the other hand, if your clients each have their own physical database with separate connections and credentials, and those databases are absolutely identical, you could get away with #1 (one project, many profiles). The "hardest" parts of that approach would likely be managing secrets and generating/maintaining a list of targets that you could iterate over (ideally in a parallel fashion).
Upvotes: 2