David542
David542

Reputation: 110502

Why is BigQuery so slow on non-large data sizes?

We have found BigQuery to work great on data sets larger than 100M rows, where the 'initialization time' doesn't really come into effect (or is negligible compared to the rest of the query).

However, on anything under that, the performance is quite slow and poor, which makes it (1) ill-suited to working in an interactive BI tool; and (2) inferior to other products, such as Redshift or even ElasticSearch where the data size is under 100M rows. Actually, we had an engineer at our organization that was evaluating a technology for doing queries on data sizes between 1M and 100M rows for an analytics product that has about 1000 users, and his feedback was that he could not believe how slow BigQuery was.

Without a defense of the BigQuery product, I was wondering if there were any plans on improving:

  1. The speed of BigQuery -- especially its initialization time -- on queries of non-massive data sets?
  2. Will BigQuery ever be able to deliver sub-second response times on 'regular' queries (such as a simple aggregation group by) on datasets under a certain size?

Upvotes: 26

Views: 11806

Answers (3)

dylanvanw
dylanvanw

Reputation: 3341

BigQuery finally released a new feature to address this problem.

It allows short queries to be run much faster. It has potential to make it possible to use bigquery data in user facing dashboards or applications.

To run a query as a short optimized query, you simply have the select it as the query mode when executing a query.

enter image description here

Upvotes: 0

Murta
Murta

Reputation: 2215

After exacts 4 years since this question, we have amazing news to BigQuery users! As stated in this Bi Engine release note from 2021-02-25:

The BI Engine SQL interface expands BI Engine to integrate with other business intelligence (BI) tools such as Looker, Looqbox, Tableau, Power BI, and custom applications to accelerate data exploration and analysis. This page provides an overview of the BI Engine SQL interface, and the expanded capabilities that it brings to this preview version of BI Engine.

I believe this can solve the query latency issue mentioned by David542 question.

Upvotes: 4

Elliott Brossard
Elliott Brossard

Reputation: 33765

It's time spent on metadata/initiation, but actual execution time is very small. We have work in progress that will address this, but some of the changes are complicated and will take a while.

You can imagine that in its infancy, BigQuery could have central systems for managing jobs, metadata, etc. in a manner that performed very well for all N0 entities using the service. Once you get to N1 entities, however, it may be necessary to rearchitect some things to make them have as little latency as possible. For notification about new features--which is also where we would announce API improvements related to start-up latency--keep an eye on our release notes, which you can also subscribe to as an RSS feed.

Upvotes: 18

Related Questions