Is hard-coding the source and destination table names in my Python data pipeline code a security risk?

I am building data pipelines (ETL) using Python and BigQuery. My repository is safely stored on a GitHub-like service and the pipeline will be built to a Docker container that is later run on a Kubernetes cluster.

The pattern is always the same for the data pipelines:

  1. Download the data from BigQuery. The source queries are hard-coded, containing the dataset.table names.
  2. Process the data using pandas.
  3. Upload the data to specific BigQuery tables. Again, the destination table names are hard-coded.

I let the Python BigQuery Client object to manage the project id as when using the private key to generate the client, the project-id becomes an attribute of the Client, so I don't need to worry to much about that. The private key is safely passed to the container as a secret.

My question is if that building my pipelines this way poses a security risk?

Another thing I've tried is creating environment variables for the destination table names. But it seems to me this may add some unnecessary complexity and obscurity to the code.

Upvotes: 0

Views: 51

Answers (0)

Related Questions