pgruetter
pgruetter

Reputation: 1214

Copy file from dbfs in cluster-scoped init script

I want to try out cluster scoped init scripts on a Azure Databricks cluster. I'm struggling to see which commands are available.

Basically, I've got a file on dbfs that I want to copy to a local directory /tmp/config when the cluster spins up.

So I created a very simple bash script:

#!/bin/bash
mkdir - p /tmp/config
databricks fs cp dbfs:/path/to/myFile.conf /tmp/config

Spinning up the cluster fails with "Cluster terminated. Reason: Init Script Failure". Looking at the log on dbfs, I see the error

bash: line 1: databricks: command not found

OK, so databricks as a command is not available. That's the command I use on the local bash to copy files from and to dbfs.

What other commands are available to copy a file from dbfs? And more general: Which commands are actually available?

Upvotes: 2

Views: 3901

Answers (2)

Mimi Müller
Mimi Müller

Reputation: 516

The dbfs is mounted to the clusters, so you can just copy it in your shell script:

e.g.

cp /dbfs/your-folder/your-file.txt ./your-file-txt

If you do a dir on the /dbfs location you get as a return all the folders/data you have in your dbfs.

You can also first test it in a notebook via

%sh
cd /dbfs
dir

Upvotes: 2

CHEEKATLAPRADEEP
CHEEKATLAPRADEEP

Reputation: 12768

By default, Databricks CLI is not installed on the databricks cluster. That's the reason you see this error message bash: line 1: databricks: command not found.

To achieve this, you should use dbutils commands as shown below.

dbutils.fs.mkdirs("/tmp/config")
dbutils.fs.mv("/configuration/proxy.conf", "/tmp/config")

enter image description here

Reference: Databricks Utilities

Hope this helps.

Upvotes: 1

Related Questions