themightyhulk
themightyhulk

Reputation: 175

Create Hive table and insert data from xls file

I have gotten a project task from my supervisor who claims it is possible to use Hive within HDInsight (for Windows) in order to query two different file types and then extract data from them. One of these files is a .xls and the other a .csv file.

I have managed to upload both of those files to the Hadoop cluster with VS and then tried creating a Hive table from the .xls file (I used tutorials before that made use of .csv files working just fine with Hive) but with the .xls file I constantly receive the "Failed" error when trying.

I tried the following sample code for creating the table, delimiting, field termination (of which I tried several ones without success), file type (not sure what else to use here either) and target location.

DROP TABLE IF EXISTS table1;

CREATE EXTERNAL TABLE IF NOT EXISTS table1(id int, postcde int, city string) 
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ' ' 
    STORED AS TEXTFILE LOCATION 'wasb://[email protected]/folder1/data.xls/'

I am unsure as to whether or not this is possible at all since there seem to be contradictory responses to a similar question here but according to my supervisor, there should be a possibility to do this with Hive - without having to convert file types to the same format before uploading to Hadoop!

https://community.hortonworks.com/questions/31968/hi-is-there-a-way-to-load-xlsx-file-into-hive-tabl.html

Upvotes: 1

Views: 29578

Answers (2)

Jörn Franke
Jörn Franke

Reputation: 186

The HadoopOffice library has a Hive module (besides one for Hadoop/Spark/Flink) that allows using Excel files (.xlsx,.xls) with various features in Hive: https://github.com/ZuInnoTe/hadoopoffice/wiki

Upvotes: 1

Yang Bryan
Yang Bryan

Reputation: 451

I don't think you can assign an excel file as the location of the table.

Hive can save data in HDFS (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL) or

S3 storage (http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html)

Hive doesn't support EXCEL format directly, so you have to convert excel files to a delimited format file, then use load command to upload the file into Hive(or HDFS).

Upvotes: 6

Related Questions