Reputation: 4479
I'm trying to export a table in BigQuery to multiple CSV files of 10k lines each.
My plan is to create multiple temporary tables of 10k lines from the main table then export them. Creation of temporary table is easy :
CREATE TEMP TABLE my_temp_table00x
(
field_a STRING,
field_b INT64
);
To export tables, I can use the Bigquery Export API :
EXPORT DATA
OPTIONS (
uri = 'gs://temp-file-in-gcs/test-export-*.csv',
format = 'CSV',
overwrite = true,
header = true,
field_delimiter = ',')
AS (
SELECT field_a, field_b FROM my_temp_table00x
);
But I'm finding issues on how to divide a table into multiple 10k temp tables...
Upvotes: 0
Views: 552
Reputation: 46
Actually, there is a simpler way using table offsets. An offset defines where you start retrieving data with your request. An offset of 200 will drop the 200 first lines that your query is supposed to return.
With that in mind, we declare two offsets :
-- an offset that will be incremented during a LOOP
DECLARE offset_it INT64 DEFAULT 0;
-- the max offset that is the size of your table
DECLARE offset_max INT64 DEFAULT (SELECT COUNT(*) FROM test_table);
Then you loop over your table by chunks of 10k lines. The EXECUTE IMMEDIATE
allows you to declare a new SQL query for each new chunk :
LOOP
EXECUTE IMMEDIATE FORMAT("""
EXPORT DATA
OPTIONS (
uri = 'gs://temp-file-in-gcs/test-export-*.csv',
format = 'CSV',
overwrite = true,
header = true,
field_delimiter = ',')
AS (
SELECT ROW_NUMBER() OVER (ORDER BY field1) AS table_id,
ds.*
FROM test_table ds
ORDER BY table_id -- this is so we'll always have the same sorting
LIMIT 10000 OFFSET %d;
);"""
, offset_it);
SET offset_it = offset_it + 10000;
IF offset_it >= offset_max THEN
LEAVE;
END IF;
END LOOP;
Upvotes: 1