Reputation: 110382
Let us suppose I have a 1TB dataset in BigQuery, and I want to be able to view the data in a columnar view, limiting to 1000 results. Here are a few of the queries I might use:
1. SELECT * FROM mytable LIMIT 1000
2. SELECT first_name, last_name FROM mytable LIMIT 1000
3. SELECT last_name, first_name FROM mytable LIMIT 1000
4. SELECT * FROM mytable ORDER BY first_name LIMIT 1000
If I ran these four queries I would be charged ~$20 ($5/tb, pretend *
= first_name, last_name). This seems like a very high amount to pay to just sample the data -- is there another way to query this data to view a limited view of the data, like the above?
Upvotes: 2
Views: 1127
Reputation: 202
We have a sample table that's generated every day at work which I find extremely useful for many tasks. It's as simple as:
SELECT * FROM mytable WHERE RAND() < 0.01
The table is hierarchical, and this sampling is set to reproduce the whole structure; so queries can be tested/replicated in exactly the same form and then swapped over to the big table if needed. The 1% sample applies to the top level of the hierarchy (meaning you don't have to wonder whether you are getting valid results from branches).
For us, there is enough data that sums and ratios are generally very representative. The only kind of data that poses a significant problem is relatively rare events, which means counts of unique elements can't be relied on.
And of course, after the single daily charge for making this table, the billing goes from dollars to cents!
Upvotes: 0
Reputation: 173046
This seems like a very high amount to pay to just sample the data -- is there another way to
If your data dynamic, meaning is updated daily or whatever other way - you can use Table Decorators
For example
SELECT * FROM mytable@-3600000--1800000 LIMIT 1000
will query only data inserted within last hour, thus lowering cost a lot!!
Another option is to use Day partitioned tables so you can query only specific day worth of data
Is there a way to export a subset of the data instead of doing a query?
Yes. You can use Tabledata.list API to list page-by-page data in your original table and insert into new [sampled] table using whatever sampling logic you need. Note: this API is free as it actually doesn't use BigQuery query engine per se, but rather reading from underlying storage!!! so you can be reasonably wild :o)
Of course you need to implement this in client of your choice.
Upvotes: 7
Reputation: 1270331
I assume you are accessing BQ through the online query interface (https://bigquery.cloud.google.com/table . . . ).
Click on the table in the data set. Go down to where it says "Table Details" in bold letters, beneath the "Run Query" icon.
In the second row below that is an option for "Preview". This will show you some data and it's free.
Upvotes: 3