Reputation: 37
We have just created a new Azure Databricks resource into our resource group. In the same resource group there is an old instance of Azure Databricks. Starting from this old Databricks instance, I would copy the data stored in dbfs into the newest Databricks instance. How could I do that? My idea is to use FS commands in order to copy or move data from a dbfs to another, probably mounting the volumes, but I am not getting how could I do that. Do you have any indications?
Thanks, Francesco
Upvotes: 1
Views: 4783
Reputation: 12768
Unfortunately, there is no direct method to export and import files/folders from one workspace to another workspace.
Note: It's is highly recommended: Do not Store any Production Data in Default DBFS Folders
How to copy files/folders from one workspace to another workspace?
You need to manually download files/folders from one workspace and upload files/folders to another workspace.
The easiest way is to using DBFS Explorer:
Click this link to view: https://i.sstatic.net/umF9y.jpg
Download file/folder from DBFS to the local machine:
Method1: Using Databricks CLI
The DBFS command-line interface (CLI) uses the DBFS API to expose an easy to use command-line interface to DBFS. Using this client, you can interact with DBFS using commands similar to those you use on a Unix command line. For example:
# List files in DBFS
dbfs ls
# Put local file ./apple.txt to dbfs:/apple.txt
dbfs cp ./apple.txt dbfs:/apple.txt
# Get dbfs:/apple.txt and save to local file ./apple.txt
dbfs cp dbfs:/apple.txt ./apple.txt
# Recursively put local dir ./banana to dbfs:/banana
dbfs cp -r ./banana dbfs:/banana
Reference: Installing and configuring Azure Databricks CLI and Azure Databricks – Access DBFS
Method2: Using third-party tool named DBFS Explorer
DBFS Explorer was created as a quick way to upload and download files to the Databricks filesystem (DBFS). This will work with both AWS and Azure instances of Databricks. You will need to create a bearer token in the web interface in order to connect.
Upload file/folder from the local machine to DBFS:
There are multiple ways to upload files from a local machine to the Azure Databricks DBFS folder.
Method1: Using the Azure Databricks portal.
Method2: Using Databricks CLI
The DBFS command-line interface (CLI) uses the DBFS API to expose an easy to use command-line interface to DBFS. Using this client, you can interact with DBFS using commands similar to those you use on a Unix command line. For example:
# List files in DBFS
dbfs ls
# Put local file ./apple.txt to dbfs:/apple.txt
dbfs cp ./apple.txt dbfs:/apple.txt
# Get dbfs:/apple.txt and save to local file ./apple.txt
dbfs cp dbfs:/apple.txt ./apple.txt
# Recursively put local dir ./banana to dbfs:/banana
dbfs cp -r ./banana dbfs:/banana
Method3: Using third-party tool named DBFS Explorer
DBFS Explorer was created as a quick way to upload and download files to the Databricks filesystem (DBFS). This will work with both AWS and Azure instances of Databricks. You will need to create a bearer token in the web interface in order to connect.
Step1: Download and install DBFS Explorer and install it.
Step2: Open DBFS Explorer and Enter: Databricks URL and Personal Access Token
Step3: Select the folder where you want to upload the files from the local machine and just drag and drop in the folder to upload and click upload.
Upvotes: 2