Reputation: 23735
I'm trying to run a simple Dataflow pipeline. After finally silencing some service account-related permission errors, my pipeline has now progressed onto the next stage of failure. This time, however, I'm even more unclear as to how I'm supposed to be reading/debugging the output log:
Running the script locally, this is my output:
ERROR: Could not find a version that satisfies the requirement apache-beam==2.34.0 (from versions: none)
ERROR: No matching distribution found for apache-beam==2.34.0
WARNING:apache_beam.runners.portability.stager:Failed to download requested binary distribution of the SDK: RuntimeError('Full traceback: Traceback (most recent call last):\n File "/usr/local/Caskroom/miniconda/base/envs/myenv4/lib/python3.9/site-packages/apache_beam/utils/processes.py", line 89, in check_output\n out = subprocess.check_output(*args, **kwargs)\n File "/usr/local/Caskroom/miniconda/base/envs/myenv4/lib/python3.9/subprocess.py", line 424, in check_output\n return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,\n File "/usr/local/Caskroom/miniconda/base/envs/myenv4/lib/python3.9/subprocess.py", line 528, in run\n raise CalledProcessError(retcode, process.args,\nsubprocess.CalledProcessError: Command \'[\'/usr/local/Caskroom/miniconda/base/envs/myenv4/bin/python\', \'-m\', \'pip\', \'download\', \'--dest\', \'/var/folders/_3/tk69j41x2t9cvh0dbvzdmm2m0000gn/T/tmpr51_u_l1\', \'apache-beam==2.34.0\', \'--no-deps\', \'--only-binary\', \':all:\', \'--python-version\', \'39\', \'--implementation\', \'cp\', \'--abi\', \'cp39\', \'--platform\', \'manylinux1_x86_64\']\' returned non-zero exit status 1.\n \n Pip install failed for package: apache-beam==2.34.0 \n Output from execution of subprocess: b\'\'')
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.9 interpreter.
On GKE, this is my output:
[server]Traceback (most recent call last):
[server] File "/app/shared/to_db.py", line 101, in <module>
[server] beamer()
[server] File "/app/shared/to_db.py", line 91, in beamer
[server] quotes | beam.io.WriteToBigQuery(
[server] File "/usr/local/lib/python3.9/site-packages/apache_beam/pipeline.py", line 597, in __exit__
[server] self.result.wait_until_finish()
[server] File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1640, in wait_until_finish
[server] raise DataflowRuntimeException(
[server]apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
[server]Workflow failed.
Streaming logs from pod: python-property-tax-84ff6c46f6-qxh5h container: server
[server]/usr/local/lib/python3.9/site-packages/apache_beam/__init__.py:79: UserWarning: This version of Apache Beam has not been sufficiently tested on Python 3.9. You may encounter bugs or missing features.
[server] warnings.warn(
[server]/usr/local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery.py:2103: BeamDeprecationWarning: options is deprecated since First stable release. References to <pipeline>.options will not be supported
[server] is_streaming_pipeline = p.options.view_as(StandardOptions).streaming
[server]/usr/local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery_file_loads.py:1112: BeamDeprecationWarning: options is deprecated since First stable release. References to <pipeline>.options will not be supported
[server] temp_location = p.options.view_as(GoogleCloudOptions).temp_location
[server]ERROR: Could not find a version that satisfies the requirement apache-beam==2.34.0 (from versions: none)
[server]ERROR: No matching distribution found for apache-beam==2.34.0
[server]WARNING:apache_beam.runners.portability.stager:Failed to download requested binary distribution of the SDK: RuntimeError('Full traceback: Traceback (most recent call last):\n File "/usr/local/lib/python3.9/site-packages/apache_beam/utils/processes.py", line 89, in check_output\n out = subprocess.check_output(*args, **kwargs)\n File "/usr/local/lib/python3.9/subprocess.py", line 424, in check_output\n return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,\n File "/usr/local/lib/python3.9/subprocess.py", line 528, in run\n raise CalledProcessError(retcode, process.args,\nsubprocess.CalledProcessError: Command \'[\'/usr/local/bin/python\', \'-m\', \'pip\', \'download\', \'--dest\', \'/tmp/tmpyx3iprn_\', \'apache-beam==2.34.0\', \'--no-deps\', \'--only-binary\', \':all:\', \'--python-version\', \'39\', \'--implementation\', \'cp\', \'--abi\', \'cp39\', \'--platform\', \'manylinux1_x86_64\']\' returned non-zero exit status 1.\n \n Pip install failed for package: apache-beam==2.34.0 \n Output from execution of subprocess: b\'\'')
[server]WARNING:root:Make sure that locally built Python SDK docker image has Python 3.9 interpreter.
As you can see, I've tried running both locally and on GKE: similar errors. I don't see anything wrong in my dockerfile:
# Python image to use.
FROM python:3.9
# Set the working directory to /app
WORKDIR /app
# copy the requirements file used for dependencies
# COPY requirements.txt .
RUN apt-get update
RUN apt-get install -y gdal-bin
# Install any needed packages specified in requirements.txt
RUN pip install --upgrade pip
# If I don't redundantly install here, python gives me a "apache-beam: import not found" error
RUN pip install apache-beam
RUN pip install "apache-beam[gcp]"
RUN pip install poetry
# Copy the rest of the working directory contents into the container at /app
COPY . .
RUN poetry install
# Run app.py when the container launches
ENTRYPOINT ["python", "shared/to_db.py"]
In pyproject.toml
:
[tool.poetry.dependencies]
# ...
python = ">=3.9,<3.11"
google-cloud-bigquery = "^2.30.1"
BigQuery-Python = "^1.15.0"
apache-beam = {extras = ["gcp"], version = "^2.34.0"}
wheel = "^0.37.0"
Google's own documentation says Dataflow supports beam v2.34.0
. So why am I getting:
ERROR: Could not find a version that satisfies the requirement apache-beam==2.34.0 (from versions: none)
ERROR: No matching distribution found for apache-beam==2.34.0
Upvotes: 0
Views: 2804
Reputation: 374
Try changing the Python version to 3.8, which is the latest supported version by Apache Beam: https://beam.apache.org/get-started/quickstart-py
Upvotes: 3