Lossa
Lossa

Reputation: 371

Saving Matplotlib Output to Blob Storage on Databricks

I'm trying to write matplotlib figures to the Azure blob storage using the method provided here: Saving Matplotlib Output to DBFS on Databricks.

However, when I replace the path in the code with

path = 'wasbs://[email protected]/'

I get this error

[Errno 2] No such file or directory: 'wasbs://[email protected]/'

I don't understand the problem...

Upvotes: 4

Views: 2941

Answers (4)

lin xiao
lin xiao

Reputation: 1

I didn't succeed using dbutils, which cannot be correctly created. But I did succeed by mounting the file-shares to a Linux path, like this: https://learn.microsoft.com/en-us/azure/azure-functions/scripts/functions-cli-mount-files-storage-linux

Upvotes: 0

Orhan Celik
Orhan Celik

Reputation: 1575

You can write with .savefig() directly to Azure blob storage- you just need to mount the blob container before.

The following works for me, where I had mounted the blob container as /mnt/mydatalakemount

plt.savefig('/dbfs/mnt/mydatalakemount/plt.png')

or

fig.savefig('/dbfs/mnt/mydatalakemount/fig.png')

Documentation on mounting blob container is here.

Upvotes: 1

Lossa
Lossa

Reputation: 371

This is what I also came up with so far. In order to reload the image from blob and display it as png in a databricks notebook again I use the following code:

blob_path = ...
dbfs_path = ...
dbutils.fs.cp( blob_path, dbfs_path ) 

with open( dbfs_path, "rb" ) as f:
  im = BytesIO( f.read() )

img = mpimg.imread( im ) 
imgplot = plt.imshow( img )
display( imgplot.figure )

Upvotes: 0

CHEEKATLAPRADEEP
CHEEKATLAPRADEEP

Reputation: 12788

As per my research, you cannot save Matplotlib output to Azure Blob Storage directly.

You may follow the below steps to save Matplotlib output to Azure Blob Storage:

Step 1: You need to first save it to the Databrick File System (DBFS) and then copy it to Azure Blob storage.

Saving Matplotlib output to Databricks File System (DBFS): We are using the below command to save the output to DBFS: plt.savefig('/dbfs/myfolder/Graph1.png')

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'fruits':['apple','banana'], 'count': [1,2]})
plt.close()
df.set_index('fruits',inplace = True)
df.plot.bar()
plt.savefig('/dbfs/myfolder/Graph1.png')

enter image description here

Step 2: Copy the file from Databricks File System to Azure Blob Storage.

There are two methods to copy file from DBFS to Azure Blob Stroage.

Method 1: Access Azure Blob storage directly

Access Azure Blob Storage directly by setting "Spark.conf.set" and copy file from DBFS to Blob Storage.

spark.conf.set("fs.azure.account.key.< Blob Storage Name>.blob.core.windows.net", "<Azure Blob Storage Key>")

Use dbutils.fs.cp to copy file from DBFS to Azure Blob Storage:

dbutils.fs.cp('dbfs:/myfolder/Graph1.png', 'wasbs://<Container>@<Storage Name>.blob.core.windows.net/Azure')

enter image description here

Method 2: Mount Azure Blob storage containers to DBFS

You can mount a Blob storage container or a folder inside a container to Databricks File System (DBFS). The mount is a pointer to a Blob storage container, so the data is never synced locally.

dbutils.fs.mount(
  source = "wasbs://[email protected]/Azure",
  mount_point = "/mnt/chepra",
  extra_configs = {"fs.azure.sas.sampledata.chepra.blob.core.windows.net":dbutils.secrets.get(scope = "azurestorage", key = "azurestoragekey")})

Use dbutils.fs.cp copy the file to Azure Blob Storage Container:

dbutils.fs.cp('dbfs:/myfolder/Graph1.png', '/dbfs/mnt/chepra')

enter image description here

By following Method1 or Method2 you can successfully save the output to Azure Blob Storage.

enter image description here

For more details, refer "Databricks - Azure Blob Storage".

Hope this helps. Do let us know if you any further queries.

Upvotes: 3

Related Questions