Reputation: 815
BigQuery (BQ) has its own storage system which is completely separated from the Google Cloud Store (GCS).
My question is: why doesn't BQ directly process data stored on the GCS like Hadoop Hive? What is the benefit and necessity of this design?
Upvotes: 1
Views: 172
Reputation: 41
BigQuery gains several benefits from having its own separate storage.
For one, BigQuery is able to optimize the storage of it’s data constantly by moving and reordering it on the disks that it is stored on and by adding more disks and repeating the process as the database grows larger and larger.
BigQuery also utilizes a separate compute layer to query the storage layer, allowing the storage layer to scale while requiring less overall hardware to run the queries. This gives BigQuery the ability to call on more processing power as it needs it, but not have idle hardware when queries from a specific database are not being executed.
For a more in depth explanation of BigQueries structure and optimizations you can checkout this article I wrote for The Data School.
Upvotes: 3
Reputation: 1714
That is because BigQuery uses column oriented database systems and it has background processes that constantly check if the data is stored in the optimal way. Therefore, the data is managed by BigQuery (that's why it has own storage) and it only exposes the highest layer to the user.
See this article for more details:
When you load bits into BigQuery, the service takes on the full responsibility of managing that data, and only exposing the logical database primitives to you
Upvotes: 3