jay12345
jay12345

Reputation: 71

Bigquery export splits into multiple files with some empty files

I am trying to use the bigquery export functionality to push data out to GCS in json format. At the end of the process inorder to validate the count of exported records in the GCS file, I am creating an external table with auto schema detection just to take a count of records in the GCS files exported.

This works for single exported files. But for tables greater than 1gb in size, i use the wild card inorder to split into multiple files. This results in multiple files with some empty files as well created.

The empty files are causing an error while querying the external table : "400 Schema has no fields".

Please suggest any ideas to:

  1. Either make sure that empty files do not get created in the export operation for multiple files scenario
  2. To ignore empty files in the external table creation.
  3. Any other way to take count of records in GCS after the export operation

Upvotes: 7

Views: 3464

Answers (1)

Fasoeu
Fasoeu

Reputation: 1292

I had the same problem but I found a workaround: it seems a TEMP TABLE does the trick.

(EDIT: reading the doc I noticed "export data" has always been described for BigQuery tables, non for custom selects. And since I never experienced empty files when exporting real tables, I gave temp tables the same chance)

Imagine we have the following query:

EXPORT DATA OPTIONS(
    uri='gs://mybucket/extract-here/*.csv.gz'
    , format='CSV'
    , compression='GZIP'
    , overwrite=true
    , header=true
    , field_delimiter=","
) AS (
    WITH mytable AS (
        SELECT col FROM UNNEST([1,2,3,4,5,6,7,8]) AS col
    )
    SELECT * FROM mytable
);

You can rewrite it as following:

BEGIN
    CREATE TEMP TABLE _SESSION.tmpExportTable AS (
        WITH mytable AS (
            SELECT col FROM UNNEST([1,2,3,4,5,6,7,8]) AS col
        )
        SELECT * FROM mytable
    );
    EXPORT DATA OPTIONS(
        uri='gs://mybucket/extract-here/*.csv.gz'
        , format='CSV'
        , compression='GZIP'
        , overwrite=true
        , header=true
        , field_delimiter=","
    ) AS
    SELECT * FROM _SESSION.tmpExportTable;
END;

Upvotes: 12

Related Questions