Reputation: 4924
I am trying to do something which is very simple in other data services. I am trying to make a relatively simple SQL query and return it as a dataframe in python. I am on Windows 10 and using Phython 2.7 (specifically Canopy 1.7.4)
Typically this would be done with pandas.read_sql_query but due to some specifics with BigQuery they require a different method pandas.io.gbq.read_gbq
This method works fine unless you want to make a Big Query. If you make a Big Query on BigQuery you get the error
GenericGBQException: Reason: responseTooLarge, Message: Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors
This was asked and answered before in this ticket but neither of the solutions are relevant for my case
Python BigQuery allowLargeResults with pandas.io.gbq
One solution is for python 3 so it is a nonstarter. The other is giving an error due to me being unable to set my credentials as an environment variable in windows.
ApplicationDefaultCredentialsError: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
I was able to download the JSON credentials file and I have set it as an environment variable in the few ways I know how but I still get the above error. Do I need to load this in some way in python? It seems to be looking for it but unable to find is correctly. Is there a special way to set it as an environment variable in this case?
Upvotes: 0
Views: 1039
Reputation: 3448
Here is a code that fully works within python 2.7 on Windows:
import pandas as pd
my_qry="<insert your big query here>"
### Here Put the data from your credentials file of the service account - all fields are available from there###
my_file="""{
"type": "service_account",
"project_id": "cb4recs",
"private_key_id": "<id>",
"private_key": "<your private key>\n",
"client_email": "<email>",
"client_id": "<id>",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://accounts.google.com/o/oauth2/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "<x509 url>"
}"""
df=pd.read_gbq(qry,project_id='<your project id>',private_key=my_file)
That's it :)
Upvotes: 0
Reputation: 2406
You can do it in Python 2.7 by changing the default dialect from legacy to standard in pd.read_gbq
function.
pd.read_gbq(query, 'my-super-project', dialect='standard')
Indeed, you can read in Big Query documentation for the parameter AllowLargeResults:
AllowLargeResults: For standard SQL queries, this flag is ignored and large results are always allowed.
Upvotes: 2
Reputation: 4924
I have found two ways of directly importing the JSON credentials file. Both based on the original answer in Python BigQuery allowLargeResults with pandas.io.gbq
1) Credit to Tim Swast
First
pip install google-api-python-client
pip install google-auth
pip install google-cloud-core
then replace
credentials = GoogleCredentials.get_application_default()
in create_service() with
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file('path/file.json')
2)
Set the environment variable manually in the code like
import os,os.path
os.environ['GOOGLE_APPLICATION_CREDENTIALS']=os.path.expanduser('path/file.json')
I prefer method 2 since it does not require new modules to be installed and is also closer to the intended use of the JSON credentials.
Note:
You must create a destinationTable and add the information to run_query()
Upvotes: 1