thn
thn

Reputation: 63

Passing Array Parameter to SQL for BigQuery in Python

I have a set of IDs (~200k) and I need to get all the rows in a BigQuery Table with those IDs. I tried to construct a list in python and pass it as a parameter to the SQL query using @ but I get TypeError: 'ArrayQueryParameter' object is not iterable error. Here is the code I tried (very similar to https://cloud.google.com/bigquery/querying-data#running_parameterized_queries):

id_list = ['id1', 'id2'] 
query = """
    SELECT id
    FROM `my-db`
    WHERE id in UNNEST(@ids)
"""
query_job = client.run_async_query(
    str(uuid.uuid4()),
    query,
    query_parameters=(
        bigquery.ArrayQueryParameter('ids', 'ARRAY<STRING>', id_list)
    )
)

Upvotes: 3

Views: 10185

Answers (4)

Andrey E
Andrey E

Reputation: 866

If you want to use the simple query like client.query, not client.run_async_query as shown in the answers above. You can to pass an additional parameter QueryJobConfig. Simply add your arrays to query_parameters using bigquery.ArrayQueryParameter.

The following code worked for me:

    query = f"""
                SELECT distinct pipeline_commit_id, pipeline_id, name
                FROM `{self.project_id}.{self.dataset_id}.pipelines_{self.table_suffix}`,
                UNNEST(labels) AS label
                where label.value IN UNNEST(@labels)
           """
    job_config = bigquery.QueryJobConfig(
        query_parameters=[
            bigquery.ArrayQueryParameter('labels', 'STRING', labels)
        ]
    )
    query_job = self.client.query(query, job_config=job_config)

Based on those examples: https://cloud.google.com/bigquery/docs/parameterized-queries

Upvotes: 0

Krystian
Krystian

Reputation: 11

Above answers are a better solution but you may find a use for this too whe quickly drafting something in notebooks:

turn a list into a string of date values, comma-separated and in quotes. Then pass the string into the query like so:

id_list = ['id1', 'id2'] 
# format into a query valid string
id_string = '"'+'","'.join(id_list)+'"'

client = bigquery.Client()
query = f"""
    SELECT id
    FROM `my-db`
    WHERE id in {id_string}
"""
query_job=client.query(query) 
results = query_job.result()

Upvotes: 0

Willian Fuks
Willian Fuks

Reputation: 11777

Probably the issue here is that you are not passing a tuple to the function.

Try adding a comma before closing the parenthesis, like so:

id_list = ['id1', 'id2'] 
query = """
    SELECT id
    FROM `my-db`
    WHERE id in UNNEST(@ids)
"""
query_job = client.run_async_query(
    str(uuid.uuid4()),
    query,
    query_parameters=(
        bigquery.ArrayQueryParameter('ids', 'STRING', id_list),
    )
)

In Python if you do:

t = (1)

and then run:

type(t)

You will find the result to be int. But if you do:

t = (1,)

Then it results in a tuple.

Upvotes: 6

Elliott Brossard
Elliott Brossard

Reputation: 33705

You need to use 'STRING' rather than 'ARRAY<STRING>' for the array element type, e.g.:

query_parameters=(
    bigquery.ArrayQueryParameter('ids', 'STRING', id_list)

The example from the querying data topic is:

def query_array_params(gender, states):
    client = bigquery.Client()
    query = """
        SELECT name, sum(number) as count
        FROM `bigquery-public-data.usa_names.usa_1910_2013`
        WHERE gender = @gender
        AND state IN UNNEST(@states)
        GROUP BY name
        ORDER BY count DESC
        LIMIT 10;
        """
    query_job = client.run_async_query(
        str(uuid.uuid4()),
        query,
        query_parameters=(
            bigquery.ScalarQueryParameter('gender', 'STRING', gender),
            bigquery.ArrayQueryParameter('states', 'STRING', states)))
    query_job.use_legacy_sql = False

    # Start the query and wait for the job to complete.
    query_job.begin()
    wait_for_job(query_job)
    print_results(query_job.results())

Upvotes: 2

Related Questions