How can one detect programmatically that his code is running on Google Cloud?

I'm trying to adapt my Spark jobs that are currently running on an on-premise Hadoop cluster. I want to modify it so that it keeps supporting run on-premise and run on Google cloud.

I was thinking to have a method to detect if a given environment variable is defined to determine if the code is running in the cloud:

def run_on_gcp():
  return is_defined(os.env["ENVIRONMENT_VARIABLE"])

I wanted to know what would be an ENVIRONMENT_VARIABLE that is always defined on Google cloud and that is accessible from a Dataproc instance? I was thinking of PROJECT_ID OR BUCKET, which such variable do you usually use? How do you usually detect programmatically where your code is running? Thanks

Upvotes: 1

Answers (2)

howie

Reputation: 2705

When you submit job to dataproc, you can assign your args. such as profile name, cluster name.

CMD="--job mytestJob \
 --job-args path=gs://tests/report\
   profile=gcp \
   cluster_name=${GCS_CLUSTER}"


gcloud dataproc jobs submit pyspark \
    --cluster ${GCS_CLUSTER} \
    --py-files ${PY_FILES} \
    --async \
    ${PY_MAIN} \
    -- ${CMD}

Then you can dectect those args in your program .

  environment = {
      'PYSPARK_JOB_ARGS': ' '.join(args.job_args) if args.job_args else ''
  }

  job_args = dict()
  if args.job_args:
      job_args_tuples = [arg_str.split('=') for arg_str in args.job_args]
      print('job_args_tuples: %s' % job_args_tuples)
      job_args = {a[0]: a[1] for a in job_args_tuples}

  print('\nRunning job %s ...\n environment is %s\n' % (args.job_name, environment))

  os.environ.update(environment)

Upvotes: 1

Guillem Xercavins

Reputation: 7058

For this purpose you can use DATAPROC_VERSION. If you submit the following PySpark job to Dataproc it will print out the version you are using (1.3 in my case):

#!/usr/bin/python
import pyspark, os
sc = pyspark.SparkContext()
print(os.getenv("DATAPROC_VERSION"))

Upvotes: 2

How can one detect programmatically that his code is running on Google Cloud?

Answers (2)

Related Questions