Krumelur
Krumelur

Reputation: 33068

What is the difference between Azure's "Data Lake Storage Gen2" and "Data Lake Gen2"?

I'm confused by the options available when creating a storage account on Azure and am looking for clarification.

If I create a new "Storage Account" (Standard tier) from the Azure marketplace, I'm offered to activate "Data Lake Storage Gen2" in the advanced settings:

enter image description here

Once the resource is deployed I'm seeing the option to do a "Data Lake Gen2 upgrade": enter image description here

If I create the storage account using a premium tier with block blobs, that upgrade option is not showing up. However, it is when selecting page blobs.

The only visible difference I can spot by looking into the JSON of the ARM deployment: the upgraded storage account has isHnsEnabled : true. This seems to enable hierarchical namespaces and atomic directory operations. And in the UI I'm getting different icons: for the upgraded one I can see a database icon with some water in it. For the non-upgraded it's showing a folder in the storage browser.

To the questions:

  1. I can create folders in the non-upgraded blob storage, although hierarchical namespaces are not enabled. Does this mean that folder operations just take longer?
  2. If I upgrade my standard tier storage account to Data Lake Gen2 it remains in the standard tier. But how would I create a standard tier storage account with Data Lake Gen2 support right from the beginning, without upgrading later? I thought, the checkbox "Data Lake Storage Gen2" would do this, but aparently it's not.
  3. What is then the difference between "Data Lake Storage Gen2" and "Data Lake Gen2"?

Upvotes: 1

Views: 2579

Answers (1)

Peter Bons
Peter Bons

Reputation: 29840

I can create folders in the non-upgraded blob storage, although hierarchical namespaces are not enabled. Does this mean that folder operations just take longer?

I doubt it, can you tell me how you did that? You can upload a blob to a virtual folder by naming it accordingly. For example, a blob named folder/test.png upload to a container named container will appear as a file in the folder folder in the storage account explorer but there is no actual folder created.

What is then the difference between "Data Lake Storage Gen2" and "Data Lake Gen2"?

Azure Data Lake is a container for several services like Azure HDInsight, Azure Data Lake Analytics. Azure Data Lake is a solution, not a specific product. Azure Data Lake Storage is a product that is part of the Azure Data Lake solution.

There is a gen1 and a gen2, gen2 is based on storage accounts: Data Lake Storage Gen2 converges the capabilities of Azure Data Lake Storage Gen1 with Azure Blob Storage

If I upgrade my standard tier storage account to Data Lake Gen2 it remains in the standard tier. But how would I create a standard tier storage account with Data Lake Gen2 support right from the beginning, without upgrading later? I thought, the checkbox "Data Lake Storage Gen2" would do this, but aparently it's not.

It does work for me, what makes you think setting the checkbox does not do the job? With this option enabled I am able to for example create directories, something I cannot do using a regular storage account.

enter image description here

When it comes to what type of storage accounts are supported, only block blob premium accounts are supported as seen in the docs:

Data Lake Storage capabilities are supported in the following types of storage accounts:

  • Standard general-purpose v2
  • Premium block blob

Upvotes: 1

Related Questions