turtlepick
turtlepick

Reputation: 2714

Google BigQuery Appending / Updating data

As a background, I've been researching Google BigQuery to be used as a backend platform for a BI type tool. I'd like to point out I haven't yet used BigQuery, so my questions are around what I've seen on the docs.

The high-level plan was to - perhaps - have Big Query as a 'live' source for a dashboard built on top of Tableau.

Apparently the best way to load the data would be using a denormalized structure using JSON (that supports nesting)

I see my JSON looking something like this:

{
    FirstName: 'John',
    LastName: 'Doe',
    Orders: {
        orderNo: 12345, 
        orderDate: '2013-01-01'
        orderlines: {
          lineNo: 1,
          qty: 1,
          price: 12,
          productId: 1234
          productName: 'Learning System',
         productSubsystem: 'SUB'
       }
    },
    LeadScores: {
       {
        scoreName: 'Learning Tech',
        scoreValue: 123,
        scoreDate: '2013-01-01'
       },
       {
       scoreName: 'ScoreB',
       scoreValue: 15,
       scoreDate: '2013-01-01'
       }
    },
    Activities {
     ** email opens, email clicks, page view, etc. (all here) **
      {
        activityType: 'email',
        activityAction: 'open',
        activityDescription 'message-1234'
      }
    }

}

Now my questions:

Can I append records to "inner" collections (like I want to append more activities daily)? Or does it need to be another entity? (like the hole JSON is a single entity)

Does this structure makes sense or would be better to have "3" or so entities (Activities, Orders, Demographics, Scores) and use JOINs? I read that BigQuery prefers not to use JOINs.

A potential structure could be

For Scores:

 {  
   date: '2013-01-01',
   scoreName: 'Score A',
   scoreValue: '1234',
   customerId: '123456'
 }

For Activities:

 {  
   date: '2013-01-01',
   activityType: 'email',
   activityAction: 'open',
   extra: '',
   customerId: '123456'
 }

For Demographic

  {
       customerId: '123456',
       firstName: 'A',
       lastName: 'B', etc..

  }

Which approach makes more sense?

Thank you!

Upvotes: 1

Views: 2348

Answers (1)

Felipe Hoffa
Felipe Hoffa

Reputation: 59325

Question a) is simple: You can add rows and columns, but you can't change existing rows. Adding data to a nested structure of an existing row, would qualify as changing said row - hence not possible.

Question b) is a design question. You got the basics, but without knowing your querying pattern, is hard to optimize. BigQuery is flexible enough to work both ways, but optimizing will require more data.

A good rule of thumb to follow is that storage is cheap: You could keep normalized and denormalized data - and choose depending on the query. The best pattern to use will surface up fast this way!

Upvotes: 1

Related Questions