Reputation: 621
The following HQL works to create a Hive table in HDInsight which I can successfully query. But, I have several questions about WHY it works:
My data rows are, in fact, terminated by carriage return line feed, so why does 'COLLECTION ITEMS TERMINATED BY \002' work? And what is \002 anyway? And no location for the blob is specified so, again, why does this work?
All attempts at creating the same table and specifying "CREATE EXTERNAL TABLE...LOCATION '/user/hive/warehouse/salesorderdetail'" have failed. The table is created but no data is returned. Leave off "external" and don't specify any location and suddenly it works. Wtf?
CREATE TABLE IF NOT EXISTS default.salesorderdetail( SalesOrderID int, ProductID int, OrderQty int, LineTotal decimal ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '\002' MAP KEYS TERMINATED BY '\003' STORED AS TEXTFILE
Any insights are greatly appreciated.
UPDATE:Thanks for the help so far. Here's the exact syntax I'm using to attempt external table creation. (I've only changed the storage account name.) I don't see what I'm doing wrong.
drop table default.salesorderdetailx; CREATE EXTERNAL TABLE default.salesorderdetailx(SalesOrderID int, ProductID int, OrderQty int, LineTotal decimal) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '\002' MAP KEYS TERMINATED BY '\003' STORED AS TEXTFILE LOCATION 'wasb://[email protected]/mycn-1/hive/warehouse/salesorderdetailx'
Upvotes: 0
Views: 408
Reputation: 5225
When you create your cluster in HDInsight, you have to specify underlying blob storage. It assumes that you are referencing that blob storage. You don't need to specific a location because your query is creating an internal table (see answer #2 below) which is created at a default location. External tables need to specify a location in Azure blob storage (outside of the cluster) so that the data in the table is not deleted when the cluster is dropped. See the Hive DDL for more information.
By default, tables are created as internal, and you have to specify the "external" to make them external tables.
Use EXTERNAL tables when:
Use INTERNAL tables when:
Does the container "user/hive/warehouse/salesorderdetail" exist in your blob storage? That might explain why it is failing for your external table query.
Upvotes: 2