luca
luca

Reputation: 65

httplib2.socks.HTTPError: (403, b'Forbidden') python apache-beam dataflow

I work on a google cloud environment where i don't have internet access. I'm trying to launch a dataflow job. I'm using a proxy to access the internet. when i run a simple wordcount.py with dataflow i get this error

WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 4.750968074377858 seconds before retrying _uncached_gcs_file_copy because we caught exception: httplib2.socks.HTTPError: (403, b'Forbidden')
 Traceback for above exception (most recent call last):
  File "/opt/py38/lib64/python3.8/site-packages/apache_beam/utils/retry.py", line 275, in wrapper
    return fun(*args, **kwargs)
  File "/opt/py38/lib64/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 631, in _uncached_gcs_file_copy
    self.stage_file(to_folder, to_name, f, total_size=total_size)
  File "/opt/py38/lib64/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 735, in stage_file
    response = self._storage_client.objects.Insert(request, upload=upload)
  File "/opt/py38/lib64/python3.8/site-packages/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py", line 1152, in Insert
    return self._RunMethod(
  File "/opt/py38/lib64/python3.8/site-packages/apitools/base/py/base_api.py", line 728, in _RunMethod
    http_response = http_wrapper.MakeRequest(
  File "/opt/py38/lib64/python3.8/site-packages/apitools/base/py/http_wrapper.py", line 359, in MakeRequest
    retry_func(ExceptionRetryArgs(http, http_request, e, retry,
  File "/opt/py38/lib64/python3.8/site-packages/apache_beam/io/gcp/gcsio_overrides.py", line 45, in retry_func
    return http_wrapper.HandleExceptionsAndRebuildHttpConnections(retry_args)
  File "/opt/py38/lib64/python3.8/site-packages/apitools/base/py/http_wrapper.py", line 304, in HandleExceptionsAndRebuildHttpConnections
    raise retry_args.exc
  File "/opt/py38/lib64/python3.8/site-packages/apitools/base/py/http_wrapper.py", line 348, in MakeRequest
    return _MakeRequestNoRetry(
  File "/opt/py38/lib64/python3.8/site-packages/apitools/base/py/http_wrapper.py", line 397, in _MakeRequestNoRetry
    info, content = http.request(
  File "/opt/py38/lib64/python3.8/site-packages/google_auth_httplib2.py", line 209, in request
    self.credentials.before_request(self._request, method, uri, request_headers)
  File "/opt/py38/lib64/python3.8/site-packages/google/auth/credentials.py", line 134, in before_request
    self.refresh(request)
  File "/opt/py38/lib64/python3.8/site-packages/google/auth/compute_engine/credentials.py", line 111, in refresh
    self._retrieve_info(request)
  File "/opt/py38/lib64/python3.8/site-packages/google/auth/compute_engine/credentials.py", line 87, in _retrieve_info
    info = _metadata.get_service_account_info(
  File "/opt/py38/lib64/python3.8/site-packages/google/auth/compute_engine/_metadata.py", line 234, in get_service_account_info
    return get(request, path, params={"recursive": "true"})
  File "/opt/py38/lib64/python3.8/site-packages/google/auth/compute_engine/_metadata.py", line 150, in get
    response = request(url=url, method="GET", headers=_METADATA_HEADERS)
  File "/opt/py38/lib64/python3.8/site-packages/google_auth_httplib2.py", line 119, in __call__
    response, data = self.http.request(
  File "/opt/py38/lib64/python3.8/site-packages/httplib2/__init__.py", line 1701, in request
    (response, content) = self._request(
  File "/opt/py38/lib64/python3.8/site-packages/httplib2/__init__.py", line 1421, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/opt/py38/lib64/python3.8/site-packages/httplib2/__init__.py", line 1343, in _conn_request
    conn.connect()
  File "/opt/py38/lib64/python3.8/site-packages/httplib2/__init__.py", line 1026, in connect
    self.sock.connect((self.host, self.port) + sa[2:])
  File "/opt/py38/lib64/python3.8/site-packages/httplib2/socks.py", line 504, in connect
    self.__negotiatehttp(destpair[0], destpair[1])
  File "/opt/py38/lib64/python3.8/site-packages/httplib2/socks.py", line 465, in __negotiatehttp
    raise HTTPError((statuscode, statusline[2]))

My service account have this role:

BigQuery Data Editor BigQuery User Dataflow Developer Dataflow Worker Service Account User Storage Admin

The istance have Cloud API access scopes: Allow full access to all Cloud APIs

what is the problem?

Upvotes: 0

Views: 617

Answers (2)

WingSpring
WingSpring

Reputation: 11

I encountered a similar problem when using python flask for backend and keycloak for authentication. The reason is due to the proxy setting as mentioned above. The problem can be solved by unsetting the existing proxy variables before executing the flask app:

unset http_proxy
unset https_proxy

Upvotes: 0

kiran mathew
kiran mathew

Reputation: 2363

Based on the comment @luca the above error is solved using an internal proxy that will allow access to the internet. Add this --no_use_public_ip to the command and set no_proxy="metadata.google.internal,www.googleapis.com,dataflow.googleapis.com,bigquery.googleapis.com".

Upvotes: 2

Related Questions