Reputation: 1525
I am setting up BigQuery for a greenfield GCP implementation. I am wondering if there is any best practice available regarding the project, dataset organisation e.g whether should I create a single project with different datasets for all sources/layers for raw, processed, datamart layers? Or different projects for different uses cases and access patterns?
Option 1:
Project
|_ Dataset_RAW
|_ Dataset_Processes
|_ Dataset_Datamart_Finance
|_ Dataset_Datamart_Marketing
Option 2:
Project RAW:
|_ Dataset_Source_A
|_ Dataset_Source_A
Project Processes:
|_ Dataset_Standardized
Project Finance:
|_ Dataset_Finance_DataMart
Project Marketing:
|_ Dataset_Marketing_DataMart
I suppose, it's a broad question and depends a lot on company objective. But I am yet curious if there is any guidelines available based on different scenarios.
Upvotes: 1
Views: 558
Reputation: 75735
There is 2 things to know:
Another consideration: If you want to secure your data with a VPC Service Control, it's interesting to store the sensitive data in a specific project (that you want to protect with VPC SC)
As you see, all depend of your organisation, your strategy and your wishes. My advice, is to reproduce the real team organisation, in project organisation. You have 3 different teams? Configure 3 projects, each team is responsible of its project.
Upvotes: 2