Reputation: 65
I work on a google cloud environment where i don't have internet access. I'm trying to launch a dataflow job. I'm using a proxy to access the internet. when i run a simple wordcount.py with dataflow i get this error
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 4.750968074377858 seconds before retrying _uncached_gcs_file_copy because we caught exception: httplib2.socks.HTTPError: (403, b'Forbidden')
Traceback for above exception (most recent call last):
File "/opt/py38/lib64/python3.8/site-packages/apache_beam/utils/retry.py", line 275, in wrapper
return fun(*args, **kwargs)
File "/opt/py38/lib64/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 631, in _uncached_gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/opt/py38/lib64/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 735, in stage_file
response = self._storage_client.objects.Insert(request, upload=upload)
File "/opt/py38/lib64/python3.8/site-packages/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py", line 1152, in Insert
return self._RunMethod(
File "/opt/py38/lib64/python3.8/site-packages/apitools/base/py/base_api.py", line 728, in _RunMethod
http_response = http_wrapper.MakeRequest(
File "/opt/py38/lib64/python3.8/site-packages/apitools/base/py/http_wrapper.py", line 359, in MakeRequest
retry_func(ExceptionRetryArgs(http, http_request, e, retry,
File "/opt/py38/lib64/python3.8/site-packages/apache_beam/io/gcp/gcsio_overrides.py", line 45, in retry_func
return http_wrapper.HandleExceptionsAndRebuildHttpConnections(retry_args)
File "/opt/py38/lib64/python3.8/site-packages/apitools/base/py/http_wrapper.py", line 304, in HandleExceptionsAndRebuildHttpConnections
raise retry_args.exc
File "/opt/py38/lib64/python3.8/site-packages/apitools/base/py/http_wrapper.py", line 348, in MakeRequest
return _MakeRequestNoRetry(
File "/opt/py38/lib64/python3.8/site-packages/apitools/base/py/http_wrapper.py", line 397, in _MakeRequestNoRetry
info, content = http.request(
File "/opt/py38/lib64/python3.8/site-packages/google_auth_httplib2.py", line 209, in request
self.credentials.before_request(self._request, method, uri, request_headers)
File "/opt/py38/lib64/python3.8/site-packages/google/auth/credentials.py", line 134, in before_request
self.refresh(request)
File "/opt/py38/lib64/python3.8/site-packages/google/auth/compute_engine/credentials.py", line 111, in refresh
self._retrieve_info(request)
File "/opt/py38/lib64/python3.8/site-packages/google/auth/compute_engine/credentials.py", line 87, in _retrieve_info
info = _metadata.get_service_account_info(
File "/opt/py38/lib64/python3.8/site-packages/google/auth/compute_engine/_metadata.py", line 234, in get_service_account_info
return get(request, path, params={"recursive": "true"})
File "/opt/py38/lib64/python3.8/site-packages/google/auth/compute_engine/_metadata.py", line 150, in get
response = request(url=url, method="GET", headers=_METADATA_HEADERS)
File "/opt/py38/lib64/python3.8/site-packages/google_auth_httplib2.py", line 119, in __call__
response, data = self.http.request(
File "/opt/py38/lib64/python3.8/site-packages/httplib2/__init__.py", line 1701, in request
(response, content) = self._request(
File "/opt/py38/lib64/python3.8/site-packages/httplib2/__init__.py", line 1421, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/opt/py38/lib64/python3.8/site-packages/httplib2/__init__.py", line 1343, in _conn_request
conn.connect()
File "/opt/py38/lib64/python3.8/site-packages/httplib2/__init__.py", line 1026, in connect
self.sock.connect((self.host, self.port) + sa[2:])
File "/opt/py38/lib64/python3.8/site-packages/httplib2/socks.py", line 504, in connect
self.__negotiatehttp(destpair[0], destpair[1])
File "/opt/py38/lib64/python3.8/site-packages/httplib2/socks.py", line 465, in __negotiatehttp
raise HTTPError((statuscode, statusline[2]))
My service account have this role:
BigQuery Data Editor BigQuery User Dataflow Developer Dataflow Worker Service Account User Storage Admin
The istance have Cloud API access scopes: Allow full access to all Cloud APIs
what is the problem?
Upvotes: 0
Views: 617
Reputation: 11
I encountered a similar problem when using python flask for backend and keycloak for authentication. The reason is due to the proxy setting as mentioned above. The problem can be solved by unsetting the existing proxy variables before executing the flask app:
unset http_proxy
unset https_proxy
Upvotes: 0
Reputation: 2363
Based on the comment @luca the above error is solved using an internal proxy that will allow access to the internet. Add this --no_use_public_ip to the command and set no_proxy="metadata.google.internal,www.googleapis.com,dataflow.googleapis.com,bigquery.googleapis.com".
Upvotes: 2