Ismail H
Ismail H

Reputation: 4479

BigQuery Export table in multiple CSV files of 10k rows each

I'm trying to export a table in BigQuery to multiple CSV files of 10k lines each.

My plan is to create multiple temporary tables of 10k lines from the main table then export them. Creation of temporary table is easy :

CREATE TEMP TABLE my_temp_table00x
(
    field_a             STRING,
    field_b             INT64
);

To export tables, I can use the Bigquery Export API :

EXPORT DATA
  OPTIONS (
    uri = 'gs://temp-file-in-gcs/test-export-*.csv',
    format = 'CSV',
    overwrite = true,
    header = true,
    field_delimiter = ',')
AS (
  SELECT field_a, field_b FROM my_temp_table00x
);

But I'm finding issues on how to divide a table into multiple 10k temp tables...

Upvotes: 0

Views: 552

Answers (1)

Hassane KA
Hassane KA

Reputation: 46

Actually, there is a simpler way using table offsets. An offset defines where you start retrieving data with your request. An offset of 200 will drop the 200 first lines that your query is supposed to return.

With that in mind, we declare two offsets :

-- an offset that will be incremented during a LOOP
DECLARE offset_it INT64 DEFAULT 0; 
-- the max offset that is the size of your table
DECLARE offset_max INT64 DEFAULT (SELECT COUNT(*) FROM test_table);

Then you loop over your table by chunks of 10k lines. The EXECUTE IMMEDIATE allows you to declare a new SQL query for each new chunk :

LOOP
    EXECUTE IMMEDIATE FORMAT("""
      EXPORT DATA
        OPTIONS (
          uri = 'gs://temp-file-in-gcs/test-export-*.csv',
          format = 'CSV',
          overwrite = true,
          header = true,
          field_delimiter = ',')
      AS (
        SELECT ROW_NUMBER() OVER (ORDER BY field1) AS table_id,
              ds.*
        FROM test_table ds
        ORDER BY table_id -- this is so we'll always have the same sorting
        LIMIT 10000 OFFSET %d; 
      );"""
      , offset_it);
    SET offset_it = offset_it + 10000;
    IF offset_it >= offset_max THEN
      LEAVE;
    END IF;
END LOOP;

Upvotes: 1

Related Questions