Reputation: 367
I currently have a function that reads an SQL file to execute a query on Google's BigQuery.
import pandas as pd
def func1(arg1,arg2):
with open('query.sql', 'r') as sqlfile:
sql_query= sqlfile.read()
df = pd.read_gbq(sql_query.format(arg1=arg1,arg2=arg2)
query.sql
SELECT *
FROM bigquery.dataset
WHERE col1= {arg1}
AND col2 = {arg2}
The dataset location is hardcoded in the SQL file itself and as such, makes it hard to make changes if I were to change the dataset location (I.E I would have to individually go to each SQL file and manually change the "From" clause. Since I have many SQL files, it becomes cumbersome to manually edit each individual SQL file's from clause)
So my questions is, what is the best way to make the dataset location dynamic? Ideally, the dataset location should be a variable, but the question is where to place the variable. If it is a variable, is it better to pass it in as a function argument? I.E func1 will have one more argument, called dataset_loc
import pandas as pd
def func1(arg1,arg2,dataset_loc):
with open('query.sql', 'r') as sqlfile:
sql_query= sqlfile.read()
df = pd.read_gbq(sql_query.format(arg1=arg1,arg2=arg2,dataset_loc=dataset_loc)
query.sql
SELECT *
FROM {dataset_loc}
WHERE col1 = {arg1}
AND col2 = {arg2}
Would like to know what is the best way to go around doing this. Thank you
Upvotes: 0
Views: 416
Reputation: 3166
If you are using the same functions to operate on different datasets, it is a good practice to make the function “dataset agnostic”, i.e to pass the dataset as a parameter. For me, your second example is the good approach to do it.
Also, keep in mind that now, your application might be small, but you need to prepare things for scaling up in the future. And definitely, you don’t want to have to write the same SQL query file for everyone of your datasets.
It depends on your use case but as a general rule it is recommended to manage the parameters of an application out of the code. To do this config files are used and as you are using Python take a look at this Python library which is useful to read them.
Upvotes: 1