Reputation: 30045
Working on raster files and need to gdal package. Trying to install on Azure Databricks is throwing below errors. Any clue how to get this installed on Databricks
Collecting gdal Using cached GDAL-3.0.4.tar.gz (577 kB) ERROR: Command errored out with exit status 1: command: /databricks/python3/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-i3yomji8/gdal/setup.py'"'"'; file='"'"'/tmp/pip-install-i3yomji8/gdal/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-i3yomji8/gdal/pip-egg-info cwd: /tmp/pip-install-i3yomji8/gdal/ Complete output (72 lines): running egg_info creating /tmp/pip-install-i3yomji8/gdal/pip-egg-info/GDAL.egg-info writing /tmp/pip-install-i3yomji8/gdal/pip-egg-info/GDAL.egg-info/PKG-INFO writing dependency_links to /tmp/pip-install-i3yomji8/gdal/pip-egg-info/GDAL.egg-info/dependency_links.txt writing top-level names to /tmp/pip-install-i3yomji8/gdal/pip-egg-info/GDAL.egg-info/top_level.txt writing manifest file '/tmp/pip-install-i3yomji8/gdal/pip-egg-info/GDAL.egg-info/SOURCES.txt' Traceback (most recent call last): File "/tmp/pip-install-i3yomji8/gdal/setup.py", line 151, in fetch_config p = subprocess.Popen([command, args], stdout=subprocess.PIPE) File "/usr/lib/python3.7/subprocess.py", line 775, in init restore_signals, start_new_session) File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: '../../apps/gdal-config': '../../apps/gdal-config'
Upvotes: 0
Views: 1457
Reputation: 12768
There are different methods to install packages in Azure Databricks:
Method1: Using libraries
To make third-party or locally-built code available to notebooks and jobs running on your clusters, you can install a library. Libraries can be written in Python, Java, Scala, and R. You can upload Java, Scala, and Python libraries and point to external packages in PyPI, Maven, and CRAN repositories.
Steps to install third party libraries:
Step1: Create Databricks Cluster.
Step2: Select the cluster created.
Step3: Select Libraries => Install New => Select Library Source = "Maven" => Coordinates => Search Packages => Select Maven Central => Search for the package required. Example: (GDAL) => Select the version (3.0.0) required => Install
Method2: Using Cluster-scoped init scripts
Cluster-scoped init scripts are init scripts defined in a cluster configuration. Cluster-scoped init scripts apply to both clusters you create and those created to run jobs. Since the scripts are part of the cluster configuration, cluster access control lets you control who can change the scripts.
Step1: Add the DBFS path dbfs:/databricks/scripts/gdal_install.sh to the cluster init scripts
# --- Run 1x to setup the init script. ---
# Restart cluster after running.
dbutils.fs.put("/databricks/scripts/gdal_install.sh","""
#!/bin/bash
sudo add-apt-repository ppa:ubuntugis/ppa
sudo apt-get update
sudo apt-get install -y cmake gdal-bin libgdal-dev python3-gdal""",
True)
Step2: Restart the cluster after running step1 for the first time.
For more details, refer "RasterFrames Notebook".
Hope this helps. Do let us know if you any further queries.
Upvotes: 1