Reputation: 5764
I'm using the Datalab for a Python notebook that loads data from Cloud Storage into BigQuery basically following this example.
I then saw that my original data in the Cloud Storage bucket is in the EU (eu-west3-a), the VM that executes the Datalab is in the same region, but the final data in BigQuery is in the US.
According to this post I tried setting the location for the dataset in code, but did not work. This is because there is no such option defined in the Datalab.Bigquery Python module.
So my question is: How do I set the location (zone and region) for the BigQuery dataset and its containing tables?
This is my code:
# data: https://www.kaggle.com/benhamner/sf-bay-area-bike-share/data
%%gcs read --object gs://my_bucket/kaggle/station.csv --variable stations
# CSV will be read as bytes first
df_stations = pd.read_csv(StringIO(stations))
schema = bq.Schema.from_data(df_stations)
# Create an empty dataset
#bq.Dataset('kaggle_bike_rentals').create(location='europe-west3-a')
bq.Dataset('kaggle_bike_rentals').create()
# Create an empty table within the dataset
table_stations = bq.Table('kaggle_bike_rentals.stations').create(schema = schema, overwrite = True)
# load data directly from cloud storage into the bigquery table. the locally loaded Pandas dataframe won't be used here
table_stations.load('gs://my_bucket/kaggle/station.csv', mode='append', source_format = 'csv', csv_options=bq.CSVOptions(skip_leading_rows = 1))
Update: Meanwhile I manually created the dataset in the BigQuery Web-UI and used it in code without creating it there. Now an exception will be raised if the dataset is not existing thus forbidding to create a one in code that will result in default location US.
Upvotes: 0
Views: 2492
Reputation: 135
BigQuery locations are set on a dataset level. Tables take their location based on the dataset they are in.
Setting the location of a dataset at least outside of Datalab:
from google.cloud import bigquery
bigquery_client = bigquery.Client(project='your_project')
dataset_ref = bigquery_client.dataset('your_dataset_name')
dataset = bigquery.Dataset(dataset_ref)
dataset.location = 'EU'
dataset = bigquery_client.create_dataset(dataset)
Based on the code snippet from here: https://cloud.google.com/bigquery/docs/datasets
Upvotes: 0