Reputation: 8936
I am going to set require_partition_filter to True on bigquery Table. But I can access only TableReference instead of Table. How to do this?
UPDATE
Maybe I did not express my question clearly. I need to write python program to do this. I would not like to set the configuration by commands or sql, because there are too many tables. In this program, I can generate TableReference table_ref
in the following program. But how to set require_parition_filter
on table_ref
?
def table(client, dataset_name, table_name):
dataset = client.dataset(dataset_name)
table_ref = dataset.table(table_name)
return table_ref
job = client.load_table_from_uri(
glob, # google cloud storage bucket
table_ref, # returned by table() function above
job_id='123',
job_config=config, # at beginning, set `require_parition_filter` here, but this deprecated
)
Upvotes: 0
Views: 2893
Reputation: 7058
Note that the preferred way would be to get a bigquery.Table
object from TableReference
, change require_partition_filter
and patch the table (similar example with expiration date). However, as you mention that you can only access TableReference
, you can also set
TimePartitioning.require_partition_filter
(deprecated according to docs) when creating the table with a load job. For example:
job_config = bigquery.LoadJobConfig(
schema = [
bigquery.SchemaField("foo", "STRING"),
],
time_partitioning = bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY,
require_partition_filter = True
)
)
and it works as expected:
$ bq query "SELECT * FROM test.require_filter"
Waiting on JOB_ID ... (0s) Current status: DONE
Error in query string: Error processing job 'JOB_ID': Cannot query over table 'test.require_filter' without a filter over column(s) '_PARTITION_LOAD_TIME',
'_PARTITIONDATE', '_PARTITIONTIME' that can be used for partition elimination
Full code:
import pandas as pd
from google.cloud import bigquery
PROJECT = "PROJECT_ID"
DATASET = "test"
TABLE = "require_filter"
def table(client, dataset_name, table_name):
dataset = client.dataset(dataset_name)
table_ref = dataset.table(table_name)
return table_ref
client = bigquery.Client(project=PROJECT)
job_config = bigquery.LoadJobConfig(
schema = [
bigquery.SchemaField("foo", "STRING"),
],
time_partitioning = bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY,
require_partition_filter = True
)
)
data = {"foo": ["bar"]}
df = pd.DataFrame(data)
table_ref = table(client, DATASET, TABLE)
load_job = client.load_table_from_dataframe(
df,
table_ref,
job_config = job_config
)
result = load_job.result()
Upvotes: 0
Reputation: 3642
How to do this?
As mentioned in this answer you can use an ALTER command to alter your table as follow:
#standardSQL
ALTER TABLE IF EXISTS mydataset.newtable
SET OPTIONS(
require_partition_filter = false
)
You can change the require_partition_filter
back to true if needed using the same command
Upvotes: 2