G-Unit
G-Unit

Reputation: 1

Irresolvable Build Service Bug due to OpenSSL version

the pyspark @Functions.pandas_udf functional (decorator) will result in an irresolvable dependency bug in the Foundry Build Service environment. It uses pyarrow which uses an openssl version that the build system environment does not have, and even installing it in the user / project environment by putting it in the meta.yml doesn't resolve the problem. stdout:

ImportError: PyArrow >= 4.0.0 must be installed; however, it was not found.

Traceback (most recent call last):
  File "/app/work-dir/__python_runtime_environment__/__SYMLINKS__/site-packages/pyspark/sql/pandas/utils.py", line 53, in require_minimum_pyarrow_version
    import pyarrow
  File "/app/work-dir/__environment__/__SYMLINKS__/site-packages/pyarrow/__init__.py", line 65, in <module>
    import pyarrow.lib as _lib
ImportError: /app/work-dir/__python_runtime_environment__/__SYMLINKS__/lib-dynload/../../libcrypto.so.3: version OPENSSL_3.4.0' not found (required by /app/work-dir/__environment__/__SYMLINKS__/site-packages/pyarrow/../../../././libssl.so.3)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/work-dir/__python_runtime_environment__/__SYMLINKS__/site-packages/transforms_spark_module/delegate.py", line 100, in _execute_job
    result = job.run(
  File "/app/work-dir/__python_runtime_environment__/__SYMLINKS__/site-packages/transforms/_build.py", line 329, in run
    self._transform.compute(**kwargs, **parameters)
  File "/app/work-dir/__python_runtime_environment__/__SYMLINKS__/site-packages/transforms/api/_transform.py", line 334, in compute
    output_df: Union[DataFrame, Any] = self(**kwargs)
  File "/app/work-dir/__python_runtime_environment__/__SYMLINKS__/site-packages/transforms/api/_transform.py", line 183, in __call__
    return self._compute_func(*args, **kwargs)
  File "/app/work-dir/__user_code_environment__/__SYMLINKS__/site-packages/myproject/datasets/data_anon.py", line 15, in compute
    "case_weight": redist(df, "case_weight"),
  File "/app/work-dir/__user_code_environment__/__SYMLINKS__/site-packages/myproject/datasets/utils.py", line 32, in redist
    df = df.withColumn(column_name, add_noise(column_name, dist))
  File "/app/work-dir/__user_code_environment__/__SYMLINKS__/site-packages/myproject/datasets/utils.py", line 14, in add_noise
    @F.pandas_udf(DoubleType())
  File "/app/work-dir/__python_runtime_environment__/__SYMLINKS__/site-packages/pyspark/sql/pandas/functions.py", line 338, in pandas_udf
    require_minimum_pyarrow_version()
  File "/app/work-dir/__python_runtime_environment__/__SYMLINKS__/site-packages/pyspark/sql/pandas/utils.py", line 60, in require_minimum_pyarrow_version
    raise ImportError(
ImportError: PyArrow >= 4.0.0 must be installed; however, it was not found.

Upvotes: 0

Views: 47

Answers (1)

the PySpark error message is misleading and, as you have correctly identified, is caused by a different OpenSSL version being present in the Conda environment and the Foundry Build's environment.

Newer version of Python Transforms ship with OpenSSL 3.4.0, so you can fix this issue by making sure you don't pin the openssl version in your meta.yaml file and by upgrading your repository to the latest template version.

Upvotes: 0

Related Questions