Reputation: 301
Currently using python to pull data from bigquery after getting the result need to convert it in JSON format.How to convert the result to JSON ?
query_job2 = client_bq.query(query)
query_job2.result()
rows = list(query_job2.result()) # Waits for query to finish
response = dict()
"""
Creating a nested dictionary with the tables as the keys and inside each respective table will hold cost as keys and
have a list of values
"""
for row in rows:
table = get_table_name(str(row.query))
start_time =int(row.start_time.timestamp())
end_time =int(row.end_time.timestamp())
if table in response:
if row.cost in response[table]:
response[table] = list(response[table])
response[table].append((str(row.creation_time),start_time,end_time , row.cost, str(row.query)))
else:
response[table] = {}
response[table] = (str(row.creation_time), start_time,end_time, row.cost, str(row.query))
here is the query i am using :
with data AS (
SELECT
creation_time,
total_bytes_processed,
query
FROM `project.region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
where creation_time > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -60 SECOND) AND job_type = "QUERY"
Group BY creation_time, job_id, total_bytes_processed, query
ORDER BY total_bytes_processed DESC
)
select as value
array_agg(struct( creation_time,
regexp_extract(query, r'(?i)\sfrom\s+`?(?:[\w-]+\.)*([\w-]+\.[\w-]+)`?\s' ) as table,
(total_bytes_processed/1099511627776) * 5 as cost,
query) order by (total_bytes_processed/1099511627776) * 5 desc limit 1)[offset(0)]
from data
group by timestamp_trunc(creation_time, minute)
"""
Upvotes: 2
Views: 4807
Reputation: 594
The question can be separated into two parts:
How to read data from Google BigQuery to Pandas. You can find the answer here
from google.cloud import bigquery
client = bigquery.Client()
query_sql = """SELECT col FROM `table`"""
df = client.query(query_sql).to_dataframe()
After you obtain the data in a pandas object you can use the to_json
method to convert it to a json string:
df.to_json(orient='index')
Alternatively you can export data directly from BigQuery as JSON files, either from the UI as shown here or using a python client.
The first method is better if you need to manipulate the data in memory using pandas, the second method will allow you to serialize your data for a much greater scale.
Upvotes: 3