Dataproc image version 1.4-debian9 (preview) missing AWS S3 jars (org.apache.hadoop.fs.s3a.S3AFileSystem)

Question

Using image version 1.3-debian9 shows the jars are available (attached screenshot).

Using image version preview (1.4-debian9) give the following error message (attached screenshot):

Py4JJavaError: An error occurred while calling o60.load.
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

Command to create Dataproc cluster:

gcloudataproc clusters create ${CLUSTER_NAME} --bucket ${BUCKET} --zone us-east1-d --master-machine-type n1-standard-4 --master-boot-disk-size 1TB --num-workers 3 --worker-machine-type n1-standard-4 --worker-boot-disk-size 1TB --image-version=preview --scopes 'https://www.googleapis.com/auth/cloud-platform' --project ${PROJECT} --initialization-actions gs://dataproc-initialization-actions/python/pip-install.sh,gs://dataproc-initialization-actions/connectors/connectors.sh --metadata 'gcs-connector-version=1.9.16' --metadata 'bigquery-connector-version=0.13.16' --optional-components=ANACONDA,JUPYTER

Screenshots: 1.3-debian9 1.4-debian9

Dataproc image version 1.4-debian9 (preview) missing AWS S3 jars (org.apache.hadoop.fs.s3a.S3AFileSystem)

Answers (1)

Related Questions