Reputation: 1020
There are definitions available for what is ABFS[S] and WASB[S]. But no clear demarcation of when to use what. What are the suitable and most appropriate use cases for both?
Upvotes: 40
Views: 71736
Reputation: 8824
1) Blob Storage with HTTP
Azure introduced blob storage which is an object storage with flat structure. No concept of folders or hierarchy. Although the use of slash(/) in file name gives the illusion of hierarchy.
blob endpoint (blob.core.windows.net) with HTTP protocol can be used to read and write blobs
https://storageaccount.blob.core.windows.net/container/path/to/blob
2) Blob Storage with WASBS
If Hadoop applications wanted to interact with azure blob storage, then HDFS compatibility was provided using the WASBS driver. This driver performed the complex task of mapping file system semantics (as required by the Hadoop Filesystem interface) to that of the object store style interface exposed by Azure Blob Storage.
wasbs://[email protected]
With WASB driver, tools like HDInsight using the driver can connect to blob storage on the same blob endpoint (blob.core.windows.net).
3) ADLS with ABFSS
(Ignore ADLS gen 1 which is a separate service and is now deprecated)
check this answer for diff b/w blob storage and ADLS
Then came ADLS Gen2 (Azure's HDFS offering) which supports hierarchical storage (concept of folders) with features like ACL on the files and folders. Storage accounts with hierarchical namespace feature enabled is converted from blob storage to ADLS Gen2. In order to talk to ADLS gen2, DFS endpoint (dfs.core.windows.net) is used.
abfss://[email protected]
Hadoop applications can now use ABFS driver to connect to ADLS. Because of the new DFS endpoints, the driver is now very efficient and there is no requirement for a complex mapping in the driver. Solutions like Horton works, HDInsight, azure Databricks can connect to ADLS far more efficiently using the ABFSS driver.
Also, you will notice some of the tools like powerBI supports both WASBS and ABFSS.
What to use?
If ADLS is used,
If Blob storage is used,
Update 1:
Microsoft has deprecated the Windows Azure Storage Blob driver (WASB) in favor of the Azure Blob Filesystem driver (ABFS). ABFS has numerous benefits over WASB. Use ABFS for both Blob Storage and Data Lake for newer workloads.
Upvotes: 40
Reputation: 407
ABFS stands for Azure Blob File System and Microsoft recommends it for big data workloads as it is optimized for it as mentioned here.
WASBS stands for Windows Azure Storage Blob and Microsoft recommends it as is provides TLS encrypted access as mentioned here.
Upvotes: 13
Reputation: 30015
The difference and use case are as below:
ABFS[S] is used for Azure Data Lake Storage Gen2 which is based on normal Azure storage(during creating Azure storage account, enable Hierarchical namespace, then you create a Azure Data Lake Storage Gen2). An example is here.
WASB[S] is used for the normal Azure storage. An example is here.
Upvotes: 30