Reputation: 2659
The records started out in AVRO files that were created daily with the following schema. There are 20 different attribute types stored in the "attribute_key" and "attribute_value" records, a timestamp, and device_id are also included in each measurement.
"fields" : [
{"type":"string", "name":"device_id"},
{"type":"string", "name":"record_date"},
{"type":"string", "name":"attribute_key"},
{"type":"string", "name":"attribute_value"}]
I have been able to take the daily files and load them into month seperated tables in bigquery.
device_attributes201501
device_attributes201502
device_attributes201503
device_attributes201504
device_attributes201505
device_attributes201506
device_attributes201507
device_attributes201508
device_attributes201509
device_attributes201510
device_attributes201511
device_attributes201512
My problem is two-fold,
I need to create a table that consists all of the unique device_ids collected across all time, and the latest attribute value for each value type.
device_id, record_date, attribute_key, attribute_value
abc123 2015-10-11 attribute_1 5
abc123 2015-11-11 attribute_1 5
abc123 2015-12-11 attribute_1 10
abc123 2015-10-11 attribute_1 0
abc456 2015-10-11 attribute_1 0
abc789 2015-10-11 attribute_1 0
abc123 2015-11-11 attribute_1 0
abc456 2015-11-11 attribute_1 0
abc789 2015-11-11 attribute_1 6
abc123 2015-10-11 attribute_2 blue
abc123 2015-11-11 attribute_2 red
abc123 2015-12-11 attribute_2 red
abc456 2015-12-11 attribute_2 blue
abc789 2015-12-11 attribute_2 green
A weekly, monthly, and 90 day average would also need to be computed for some attributes. (attribute_3 is the average of samples collected)
device_id, last_update, attribute_1, attribute_2
abc123 2015-12-11 6 red
abc456 2015-12-11 0 blue
abc789 2015-12-11 3 green
I'm curious how best to take take this on, and I have no idea where to go from here. The data is in bigquery now, I have access to the full suite of google clould tools... like dataflow, or anything else.
The data was originally in an S3 bucket, so I can process it with any of the solutions on AWS.
I just don't know what's the smartest way to go about doing it.
Upvotes: 0
Views: 258
Reputation: 1725
Hopefully some of these links will help you. Creating a table https://cloud.google.com/bigquery/docs/tables#creating-a-table
BigQuery Web UI https://cloud.google.com/bigquery/bigquery-web-ui
How to create a table from a query (blog post from a user). This one suggests that you can use the BQ WebUI and specify a destination table. I couldn't find that in the official docs, so not sure if this works. If not you'll need to setup the API and write a bit of code as shown in their example above. https://chartio.com/resources/tutorials/how-to-create-a-table-from-a-query-in-google-bigquery/
Upvotes: 1