xine78
xine78

Reputation: 35

Hive - How to load data from a file with filename as a column?

I am running the following commands to create my table ABC and insert data from all files that are in my designated file path. Now I want to add a column with filename, but I can't find any way to do that without looping through the files or something. Any suggestions on what the best way to do this would be?

CREATE TABLE ABC
(NAME string
,DATE string
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t' 
STORED AS TEXTFILE;

hive -e "LOAD DATA LOCAL INPATH '${DATA_FILE_PATH}' INTO TABLE ABC;"

Upvotes: 1

Views: 4813

Answers (1)

Climbs_lika_Spyder
Climbs_lika_Spyder

Reputation: 6714

Hive does have virtual columns, which include INPUT__FILE__NAME. The link shows how to use this in a statement.

To fill another table with the filename as a column. Assuming your location of your data is hdfs://hdfs.location:port/data/folder/filename1

DROP TABLE IF EXISTS ABC2;
CREATE TABLE ABC2 (
    filename STRING COMMENT 'this is the file the row was in',
    name STRING,
    date STRING);
INSERT INTO TABLE ABC2 SELECT split(INPUT__FILE__NAME,'folder/')[1],* FROM ABC;

You can alter the split to change how much of the full path you actually want to store.

Upvotes: 1

Related Questions