Databricks list all blobs in Azure Blob Storage

Question

I have mounted a Blob Storage Account in to Databricks, and can access it fine, so i know that it works.

What i want to do though, is list out the names all of the files at a given path.. currently i'm doing this with:

list = dbutils.fs.ls('dbfs:/mnt/myName/Path/To/Files/2019/03/01')
df = spark.createDataFrame(list).select('name')

The issue i have though, is that it's exceptionally slow.. due to there being around 160,000 blobs at that location (storage explorer shows this as ~1016106592 bytes which is 1Gb!)

This surely can't be pulling down all this data, all i need/want is the filename..

Is blob storage my bottle neck, or can i (somehow) get Databricks to execute the command in parallel or something?

Thanks.

Databricks list all blobs in Azure Blob Storage

Answers (1)

Related Questions