Reputation: 2714
As a background, I've been researching Google BigQuery to be used as a backend platform for a BI type tool. I'd like to point out I haven't yet used BigQuery, so my questions are around what I've seen on the docs.
The high-level plan was to - perhaps - have Big Query as a 'live' source for a dashboard built on top of Tableau.
Apparently the best way to load the data would be using a denormalized structure using JSON (that supports nesting)
I see my JSON looking something like this:
{
FirstName: 'John',
LastName: 'Doe',
Orders: {
orderNo: 12345,
orderDate: '2013-01-01'
orderlines: {
lineNo: 1,
qty: 1,
price: 12,
productId: 1234
productName: 'Learning System',
productSubsystem: 'SUB'
}
},
LeadScores: {
{
scoreName: 'Learning Tech',
scoreValue: 123,
scoreDate: '2013-01-01'
},
{
scoreName: 'ScoreB',
scoreValue: 15,
scoreDate: '2013-01-01'
}
},
Activities {
** email opens, email clicks, page view, etc. (all here) **
{
activityType: 'email',
activityAction: 'open',
activityDescription 'message-1234'
}
}
}
Now my questions:
Can I append records to "inner" collections (like I want to append more activities daily)? Or does it need to be another entity? (like the hole JSON is a single entity)
Does this structure makes sense or would be better to have "3" or so entities (Activities, Orders, Demographics, Scores) and use JOINs? I read that BigQuery prefers not to use JOINs.
A potential structure could be
For Scores:
{
date: '2013-01-01',
scoreName: 'Score A',
scoreValue: '1234',
customerId: '123456'
}
For Activities:
{
date: '2013-01-01',
activityType: 'email',
activityAction: 'open',
extra: '',
customerId: '123456'
}
For Demographic
{
customerId: '123456',
firstName: 'A',
lastName: 'B', etc..
}
Which approach makes more sense?
Thank you!
Upvotes: 1
Views: 2348
Reputation: 59325
Question a) is simple: You can add rows and columns, but you can't change existing rows. Adding data to a nested structure of an existing row, would qualify as changing said row - hence not possible.
Question b) is a design question. You got the basics, but without knowing your querying pattern, is hard to optimize. BigQuery is flexible enough to work both ways, but optimizing will require more data.
A good rule of thumb to follow is that storage is cheap: You could keep normalized and denormalized data - and choose depending on the query. The best pattern to use will surface up fast this way!
Upvotes: 1