Malcode
Malcode

Reputation: 301

Is there a way to convert results returned from bigquery to Json format using Python?

Currently using python to pull data from bigquery after getting the result need to convert it in JSON format.How to convert the result to JSON ?

 query_job2 = client_bq.query(query)
    query_job2.result()
    rows = list(query_job2.result())  # Waits for query to finish
    response = dict()
    """
    Creating a nested dictionary with the tables as the keys and inside each respective table will hold cost as keys and
    have a list of values
    """
    for row in rows:
        table = get_table_name(str(row.query))
        start_time =int(row.start_time.timestamp())
        end_time =int(row.end_time.timestamp())

        if table in response:
            if row.cost in response[table]:
                response[table] = list(response[table])
                response[table].append((str(row.creation_time),start_time,end_time , row.cost, str(row.query)))
        else:
            response[table] = {}
            response[table] = (str(row.creation_time), start_time,end_time, row.cost, str(row.query))

here is the query i am using :

     with data AS (
       SELECT
       creation_time,
       total_bytes_processed,
       query
       FROM `project.region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
       where creation_time > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -60 SECOND)  AND job_type = "QUERY"
       Group BY creation_time,  job_id, total_bytes_processed, query
       ORDER BY total_bytes_processed DESC
)
select as value
  array_agg(struct( creation_time,
       regexp_extract(query, r'(?i)\sfrom\s+`?(?:[\w-]+\.)*([\w-]+\.[\w-]+)`?\s' ) as table,
       (total_bytes_processed/1099511627776) * 5 as cost,
       query) order by (total_bytes_processed/1099511627776) * 5  desc limit 1)[offset(0)]
from data
group by timestamp_trunc(creation_time, minute)
    """

Upvotes: 2

Views: 4807

Answers (1)

gidutz
gidutz

Reputation: 594

The question can be separated into two parts:

  1. How to read data from Google BigQuery to Pandas. You can find the answer here

    from google.cloud import bigquery
    
    client = bigquery.Client()
    query_sql = """SELECT col FROM `table`"""
    
    df = client.query(query_sql).to_dataframe()
    
  2. After you obtain the data in a pandas object you can use the to_json method to convert it to a json string:

    df.to_json(orient='index')
    

Alternatively you can export data directly from BigQuery as JSON files, either from the UI as shown here or using a python client.

The first method is better if you need to manipulate the data in memory using pandas, the second method will allow you to serialize your data for a much greater scale.

Upvotes: 3

Related Questions