Reputation: 19805
In Google BigQuery [1], it support nested field type, so can I say that it is preferred way to organize than splitting into tables and query with join?
Bigquery support patch when we need to add a new columns, how is the performance when adding new column in a very large table, e.g. 1TB, are there any way to min. down time since we might need to update columns a few time a month.
[1] https://developers.google.com/bigquery/docs/data#nested
[2] https://developers.google.com/bigquery/docs/reference/v2/tables/patch
Upvotes: 3
Views: 1301
Reputation: 47
Yes. Nesting fields may be more useful in some cases of denormalization like
{
"geo": {
"type": "Polygon",
"coordinates": [-122.82613, 45.393108 ]
}
}
The schema for this might be
{"name":"geo","type":"record",
"fields":
[
{"name":"type","type":"string"},
{"name":"coordinates","type":"float","mode":"repeated"}
]
}
But in case of multi-level nested arrays (like regular Twitter JSON), BigQuery seems to report some issues.Below is a very common format of JSON output from Twitter where Coordinates are expressed in nested (multi-level arrays):
{
"geo": {
"type": "Polygon",
"coordinates": [
[
[
-122.82613,
45.393108
],
[
-122.82613,
45.456634
],
[
-122.74354,
45.456634
]
]
]
}
}
--> honestly I am still to find answer for that.
Upvotes: -1
Reputation: 26617
I wouldn't necessarily say that using nested fields is always the preferred way to organize your tables; it depends on your use case. In general, I find that nested data is usually easier to deal with, especially if most queries don't need access to the nested fields.
However, if you're using the nested fields heavily in your computation, need to correlate different nested fields with eachother, or returning results with multiple repeated fields, you might be better off using multiple tables.
Adding new columns is a metadata-only operation, so it should be O(1) time; there is no downtime for the table involved. (note that only adding columns is supported, not renaming or dropping them).
Upvotes: 4