Reputation: 8259
Here is my setup:
wasb://mybucket
set up as the default FS.What I want to do is:
local1 > ssh client1
client1> hadoop fs -ls / #list contents of blob storage bucket.
I've copied the following keys to /etc/hadoop/conf/core-site.xml
from the core-site.xml on the hdinsights head node:
...ShellDecryptionKeyProvider
Unfortunately, this requires a ShellDecryptionKeyProvider
to call out to. On windows this is a command line executable. I don't know how to provide that for linux.
Here's the output:
[rathboma@client1 yum.repos.d]$ hadoop fs -ls /
15/03/04 23:02:12 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
15/03/04 23:02:13 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
15/03/04 23:02:13 INFO impl.MetricsSystemImpl: azure-file-system metrics system started
ls: org.apache.hadoop.fs.azure.KeyProviderException: Script path is not specified via fs.azure.shellkeyprovider.script
Has anyone managed to talk to blob storage from a linux machine on Azure? How do I need to configure it?
Upvotes: 2
Views: 1724
Reputation: 2486
Instead of trying to use Hadoop fs commands, would it work to just access the storage directly? If you look at https://www.npmjs.com/package/azure-storage you will find that you can access blob storage directly via Node instead of relying on the Hadoop classes. The following example should list out all fo the files/blobs in your storage container:
var account = 'storaccount' // This is just the first part of the storage account name (the part before the first '.'')
var key = 'BASE64KEYGOESHERE==' // Retrieve the key from the Storage section of the Azure Portal
var container = 'clustercontainer' // this is the container name associated with the cluster
var azure = require('azure-storage');
var blobService = azure.createBlobService(account,key);
var i = 0;
var ct=null;
do {
blobService.listBlobsSegmented(container, ct, function(error, result, response){
if(!error){
i++;
console.log("Result set", i, ":")
for(var blob in result.entries) { console.log(result.entries[blob].name); }
console.log("Continuation? : ", result.continuationToken);
ct = result.continuationToken;
} else {
ct = null;
console.log("Error:");
console.log(error);
}
});
} while(ct);
There are several other API's available that can be used(Java, Python), or the cross-platform CLI (https://github.com/Azure/azure-xplat-cli) that may be better suited depending how you need to interact with the storage.
If you really want to try using the Hadoop fs functions from client1, you can try removing the fs.azure.account.keyprovider.mybucket.blob.core.windows.net
property from client1's settings file, and then just put the unencrypted storage access key into fs.azure.account.key.mybucket.blob.core.windows.net
. If a keyprovider is not specified, the access key should be used as-is.
Upvotes: 1