Reputation: 344
I am trying to upload blob data from Azure blob storage to a Hive table which has the following format saved in a .csv file called myblob_test.csv:
The following script was used to create the table :
CREATE TABLE IF NOT EXISTS AzureData.Events(
Day STRING,
Event_Type STRING,
Time_Stamp STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION '/bigdatapoc1/azure-data-2/myblob_test.csv';
my file is saved at the following location
When I run the table create command it completes without error. But when I query :
select * from AzureData.Events;
I get nothing. So I tried to upload the blob file used the following commands:
LOAD DATA INPATH 'wasb://[email protected]/myblob_test.csv' INTO TABLE AzureData.Events;
I get the following error :
or
LOAD DATA INPATH '/bigdatapoc1/azure-data-2/myblob_test.csv' INTO TABLE AzureData.Events;
gives the following error:
I am not sure what I am doing wrong. Can somebody point out where I am missing a step ?
Upvotes: 0
Views: 1315
Reputation: 679
First you need to understand blob containers have private and public access permissions. If it is public, your cluster can assess the container. Otherwise you will need to add the Azure storage account(s) as additional storage account(s) during the provision process. The provision process will write down the storage account access keys into the site config file. So that the cluster can access the container. For adding additional storage accounts, see https://azure.microsoft.com/en-us/documentation/articles/hdinsight-provision-clusters/.
To access the blob container, you use the following syntax: wasb[s]://@.blob.core.windows.net/ For more information, see https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-use-blob-storage/
Upvotes: 1