Reputation: 3696
For an HDInsight cluster there has to be at least one azure storage account which is its default storage account -- it is required so that it is treated as its fs (filesystem). This I get. But what about optional linked azure storage accounts? From ADF (Azure Data Factory) perspective at least, do we need to have a storage account added as linked storage account to an HDInsight cluster? Anyway the Azure storage account is accessible purely by providing just two pieces of information --- the account name and the key. Both these things are specified in Linked Servers in ADF. This guarantees the access of the storage account. What is the real benefit of having some account added as linked storage account, from ADF point of view or otherwise? Basically, what I am asking is -- is there anything that we can't do purely using account name and key without adding the account as linked storage for the given HDInsight cluster?
Upvotes: 0
Views: 173
Reputation: 3696
I think I sort of figured out the answer. With linked storage accounts the cluster, when used as a compute, can directly access BLOBS on those storage accounts without requiring us to separately specify the storage keys in queries. That's the use case for which linked storage is a must have.
Upvotes: 0
Reputation: 1511
The main reason to have additional accounts is because they have limits. A storage account can have 500 TB of data in it and 20000 request per second. Depending on the size of your cluster and work load you might hit the request limit. If you are worried about those limits and you don't want to manage alot of storage accounts you should look into Azure Data Lake.
Upvotes: 1