Reputation: 10048
I want to create a CSV file which contains the results of query. This CSV file will live in Google Cloud Storage. (This query is around 15GB) I need it to be a single file. Is it possible, if so how?
CREATE OR REPLACE TABLE `your-project.your-dataset.chicago_taxitrips_mod` AS (
WITH
taxitrips AS (
SELECT
trip_start_timestamp,
trip_end_timestamp,
trip_seconds,
trip_miles,
pickup_census_tract,
dropoff_census_tract,
pickup_community_area,
dropoff_community_area,
fare,
tolls,
extras,
trip_total,
payment_type,
company,
pickup_longitude,
pickup_latitude,
dropoff_longitude,
dropoff_latitude,
IF((tips/fare >= 0.2),
1,
0) AS tip_bin
FROM
`bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE
trip_miles > 0
AND fare > 0)
SELECT
trip_start_timestamp,
trip_end_timestamp,
trip_seconds,
trip_miles,
pickup_census_tract,
dropoff_census_tract,
pickup_community_area,
dropoff_community_area,
fare,
tolls,
extras,
trip_total,
payment_type,
company,
tip_bin,
ST_AsText(ST_SnapToGrid(ST_GeogPoint(pickup_longitude,
pickup_latitude), 0.1)) AS pickup_grid,
ST_AsText(ST_SnapToGrid(ST_GeogPoint(dropoff_longitude,
dropoff_latitude), 0.1)) AS dropoff_grid,
ST_Distance(ST_GeogPoint(pickup_longitude,
pickup_latitude),
ST_GeogPoint(dropoff_longitude,
dropoff_latitude)) AS euclidean,
CONCAT(ST_AsText(ST_SnapToGrid(ST_GeogPoint(pickup_longitude,
pickup_latitude), 0.1)), ST_AsText(ST_SnapToGrid(ST_GeogPoint(dropoff_longitude,
dropoff_latitude), 0.1))) AS loc_cross
FROM
taxitrips
LIMIT
100000000
)
Upvotes: 1
Views: 1540
Reputation: 941
Exporting 15GB to a single CSV file is not possible (to multiple files is possible). I tried your same query (Bytes processed 15.66 GB) then tried to export it to a CSV file in GCS but failed with this error
Table gs://[my_bucket]/bq_export/test.csv too large to be exported to a single file. Specify a uri including a * to shard export. See 'Exporting data into one or more files' in https://cloud.google.com/bigquery/docs/exporting-data.
BQ Documentation only allows you to export up to 1 GB of table data to a single file. Since the table exceeds 1GB then you have to use a wildcard like:
gs://your-bucket-name/csvfilename*.csv
Not sure why would you like the export csv file to be in a single file but IMHO it's too large to be in a single file. writing it to multiple files will be a lot faster since BQ would use its parallelism to write the output using multiple threads.
Upvotes: 1
Reputation: 59155
If BigQuery needs to output multiple files, you can then concatenate them into a single one with a gsutil
operation for files in GCS:
gsutil compose gs://bucket/obj1 [gs://bucket/obj2 ...] gs://bucket/composite
Note that there is a limit (currently 32) to the number of components that can be composed in a single operation.
Upvotes: 1