How to get records from row index 1 million to 2 million in big query tables?

Question

I've been trying to download the m-lab dataset from big query recently. There seems to be a limit that we can only query and get as much as around 1 million rows with one query. The m-lab dataset contains multiple billion records in many tables. I'd love to use queries like bq query --destination_table=mydataset.table1 "select * from (select ROW_NUMBER() OVER() row_number, * from (select * from [measurement-lab:m_lab.2013_03] limit 10000000)) where row_number between 2000001 and 3000000;" but it didn't work. Is there a workaround to make it work? Thanks a lot!

Jordan Tigani · Accepted Answer

If you're trying to download a large table (like the m-lab table), your best option is to use an extract job. For example, run

bq extract 'mlab-project:datasset.table' 'gs://bucket/foo*'

Which will extract the table to the google cloud storage objects gs://bucket/foo000000000.csv, gs://bucket/foo0000000001.csv, etc. The default extracts as CSV, but you can pass `--destination_format=NEWLINE_DELIMITED_JSON to extract the table as json.

The other thing to mention is that you can read the 1 millionth row in bigquery using the tabledata list api to read from that particular offset (no query required!).

bq head -n 1000 -s 1000000 'm-lab-project:dataset.table'

will read 1000 rows starting at the 1000000th row.

How to get records from row index 1 million to 2 million in big query tables?

Answers (1)

Related Questions