demiurgo86
demiurgo86

Reputation: 37

Azure Databricks - Export and Import DBFS filesystem

We have just created a new Azure Databricks resource into our resource group. In the same resource group there is an old instance of Azure Databricks. Starting from this old Databricks instance, I would copy the data stored in dbfs into the newest Databricks instance. How could I do that? My idea is to use FS commands in order to copy or move data from a dbfs to another, probably mounting the volumes, but I am not getting how could I do that. Do you have any indications?

Thanks, Francesco

Upvotes: 1

Views: 4783

Answers (1)

CHEEKATLAPRADEEP
CHEEKATLAPRADEEP

Reputation: 12768

Unfortunately, there is no direct method to export and import files/folders from one workspace to another workspace.

Note: It's is highly recommended: Do not Store any Production Data in Default DBFS Folders

enter image description here

How to copy files/folders from one workspace to another workspace?

You need to manually download files/folders from one workspace and upload files/folders to another workspace.

The easiest way is to using DBFS Explorer:

enter image description here

enter image description here

Click this link to view: https://i.sstatic.net/umF9y.jpg

Download file/folder from DBFS to the local machine:

Method1: Using Databricks CLI

The DBFS command-line interface (CLI) uses the DBFS API to expose an easy to use command-line interface to DBFS. Using this client, you can interact with DBFS using commands similar to those you use on a Unix command line. For example:

# List files in DBFS
dbfs ls
# Put local file ./apple.txt to dbfs:/apple.txt
dbfs cp ./apple.txt dbfs:/apple.txt
# Get dbfs:/apple.txt and save to local file ./apple.txt
dbfs cp dbfs:/apple.txt ./apple.txt
# Recursively put local dir ./banana to dbfs:/banana
dbfs cp -r ./banana dbfs:/banana

Reference: Installing and configuring Azure Databricks CLI and Azure Databricks – Access DBFS

Method2: Using third-party tool named DBFS Explorer

DBFS Explorer was created as a quick way to upload and download files to the Databricks filesystem (DBFS). This will work with both AWS and Azure instances of Databricks. You will need to create a bearer token in the web interface in order to connect.

enter image description here

Upload file/folder from the local machine to DBFS:

There are multiple ways to upload files from a local machine to the Azure Databricks DBFS folder.

Method1: Using the Azure Databricks portal.

enter image description here

Method2: Using Databricks CLI

The DBFS command-line interface (CLI) uses the DBFS API to expose an easy to use command-line interface to DBFS. Using this client, you can interact with DBFS using commands similar to those you use on a Unix command line. For example:

# List files in DBFS
dbfs ls
# Put local file ./apple.txt to dbfs:/apple.txt
dbfs cp ./apple.txt dbfs:/apple.txt
# Get dbfs:/apple.txt and save to local file ./apple.txt
dbfs cp dbfs:/apple.txt ./apple.txt
# Recursively put local dir ./banana to dbfs:/banana
dbfs cp -r ./banana dbfs:/banana

enter image description here

Method3: Using third-party tool named DBFS Explorer

DBFS Explorer was created as a quick way to upload and download files to the Databricks filesystem (DBFS). This will work with both AWS and Azure instances of Databricks. You will need to create a bearer token in the web interface in order to connect.

Step1: Download and install DBFS Explorer and install it.

Step2: Open DBFS Explorer and Enter: Databricks URL and Personal Access Token

enter image description here

Step3: Select the folder where you want to upload the files from the local machine and just drag and drop in the folder to upload and click upload.

enter image description here

Upvotes: 2

Related Questions