Reputation: 894
I'm working with azure databricks and blob storage. I have a storage account that stores data from IOT devices for every hour. so the folder structure is {year/month/day/hour} it stores data as csv files. My requirement is, need to access the files from azure databricks daily basis (so there will be 24 folders starting from 0-23) and need to perform some calculations.
Upvotes: 2
Views: 1824
Reputation: 2448
In order to process many files under a wasb container you'll need to use the Hadoop Input Format glob patterns. The patterns are as follow, somewhat similar to regex:
* (match 0 or more character)
? (match single character)
[ab] (character class)
[^ab] (negated character class)
[a-b] (character range)
{a,b} (alternation)
\c (escape character)
For your use case, the following should work:
df = spark.read.format("csv").load("/container/*/*/*/*.csv")
Upvotes: 2