zakdances
zakdances

Reputation: 23735

ERROR: No matching distribution found for apache-beam==2.34.0

I'm trying to run a simple Dataflow pipeline. After finally silencing some service account-related permission errors, my pipeline has now progressed onto the next stage of failure. This time, however, I'm even more unclear as to how I'm supposed to be reading/debugging the output log:

Running the script locally, this is my output:

ERROR: Could not find a version that satisfies the requirement apache-beam==2.34.0 (from versions: none)
ERROR: No matching distribution found for apache-beam==2.34.0
WARNING:apache_beam.runners.portability.stager:Failed to download requested binary distribution of the SDK: RuntimeError('Full traceback: Traceback (most recent call last):\n  File "/usr/local/Caskroom/miniconda/base/envs/myenv4/lib/python3.9/site-packages/apache_beam/utils/processes.py", line 89, in check_output\n    out = subprocess.check_output(*args, **kwargs)\n  File "/usr/local/Caskroom/miniconda/base/envs/myenv4/lib/python3.9/subprocess.py", line 424, in check_output\n    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,\n  File "/usr/local/Caskroom/miniconda/base/envs/myenv4/lib/python3.9/subprocess.py", line 528, in run\n    raise CalledProcessError(retcode, process.args,\nsubprocess.CalledProcessError: Command \'[\'/usr/local/Caskroom/miniconda/base/envs/myenv4/bin/python\', \'-m\', \'pip\', \'download\', \'--dest\', \'/var/folders/_3/tk69j41x2t9cvh0dbvzdmm2m0000gn/T/tmpr51_u_l1\', \'apache-beam==2.34.0\', \'--no-deps\', \'--only-binary\', \':all:\', \'--python-version\', \'39\', \'--implementation\', \'cp\', \'--abi\', \'cp39\', \'--platform\', \'manylinux1_x86_64\']\' returned non-zero exit status 1.\n \n Pip install failed for package: apache-beam==2.34.0           \n Output from execution of subprocess: b\'\'')
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.9 interpreter.

On GKE, this is my output:

[server]Traceback (most recent call last):
[server]  File "/app/shared/to_db.py", line 101, in <module>
[server]    beamer()
[server]  File "/app/shared/to_db.py", line 91, in beamer
[server]    quotes | beam.io.WriteToBigQuery(
[server]  File "/usr/local/lib/python3.9/site-packages/apache_beam/pipeline.py", line 597, in __exit__
[server]    self.result.wait_until_finish()
[server]  File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1640, in wait_until_finish
[server]    raise DataflowRuntimeException(
[server]apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
[server]Workflow failed.
Streaming logs from pod: python-property-tax-84ff6c46f6-qxh5h container: server
[server]/usr/local/lib/python3.9/site-packages/apache_beam/__init__.py:79: UserWarning: This version of Apache Beam has not been sufficiently tested on Python 3.9. You may encounter bugs or missing features.
[server]  warnings.warn(
[server]/usr/local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery.py:2103: BeamDeprecationWarning: options is deprecated since First stable release. References to <pipeline>.options will not be supported
[server]  is_streaming_pipeline = p.options.view_as(StandardOptions).streaming
[server]/usr/local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery_file_loads.py:1112: BeamDeprecationWarning: options is deprecated since First stable release. References to <pipeline>.options will not be supported
[server]  temp_location = p.options.view_as(GoogleCloudOptions).temp_location
[server]ERROR: Could not find a version that satisfies the requirement apache-beam==2.34.0 (from versions: none)
[server]ERROR: No matching distribution found for apache-beam==2.34.0
[server]WARNING:apache_beam.runners.portability.stager:Failed to download requested binary distribution of the SDK: RuntimeError('Full traceback: Traceback (most recent call last):\n  File "/usr/local/lib/python3.9/site-packages/apache_beam/utils/processes.py", line 89, in check_output\n    out = subprocess.check_output(*args, **kwargs)\n  File "/usr/local/lib/python3.9/subprocess.py", line 424, in check_output\n    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,\n  File "/usr/local/lib/python3.9/subprocess.py", line 528, in run\n    raise CalledProcessError(retcode, process.args,\nsubprocess.CalledProcessError: Command \'[\'/usr/local/bin/python\', \'-m\', \'pip\', \'download\', \'--dest\', \'/tmp/tmpyx3iprn_\', \'apache-beam==2.34.0\', \'--no-deps\', \'--only-binary\', \':all:\', \'--python-version\', \'39\', \'--implementation\', \'cp\', \'--abi\', \'cp39\', \'--platform\', \'manylinux1_x86_64\']\' returned non-zero exit status 1.\n \n Pip install failed for package: apache-beam==2.34.0           \n Output from execution of subprocess: b\'\'')
[server]WARNING:root:Make sure that locally built Python SDK docker image has Python 3.9 interpreter.

As you can see, I've tried running both locally and on GKE: similar errors. I don't see anything wrong in my dockerfile:

# Python image to use.
FROM python:3.9

# Set the working directory to /app
WORKDIR /app

# copy the requirements file used for dependencies
# COPY requirements.txt .

RUN apt-get update
RUN apt-get install -y gdal-bin

# Install any needed packages specified in requirements.txt
RUN pip install --upgrade pip
# If I don't redundantly install here, python gives me a "apache-beam: import not found" error
RUN pip install apache-beam
RUN pip install "apache-beam[gcp]"
RUN pip install poetry

# Copy the rest of the working directory contents into the container at /app
COPY . .

RUN poetry install

# Run app.py when the container launches
ENTRYPOINT ["python", "shared/to_db.py"]

In pyproject.toml:

[tool.poetry.dependencies]
# ...
python = ">=3.9,<3.11"
google-cloud-bigquery = "^2.30.1"
BigQuery-Python = "^1.15.0"
apache-beam = {extras = ["gcp"], version = "^2.34.0"}
wheel = "^0.37.0"

Google's own documentation says Dataflow supports beam v2.34.0. So why am I getting:

ERROR: Could not find a version that satisfies the requirement apache-beam==2.34.0 (from versions: none)
ERROR: No matching distribution found for apache-beam==2.34.0

Upvotes: 0

Views: 2804

Answers (1)

Alexander
Alexander

Reputation: 374

Try changing the Python version to 3.8, which is the latest supported version by Apache Beam: https://beam.apache.org/get-started/quickstart-py

Upvotes: 3

Related Questions