ramkasa
ramkasa

Reputation: 21

load only few files from a HDFS directory

I want to load some of the files from a HDFS directory into a table.

The files in the HDFS directory as below.

/data/log/user1log.csv
/data/log/user2log.csv
/data/log/user3log.csv
/data/log/user4log.csv
/data/log/user5log.csv

Now I want to load /data/log/user1log.csv and /data/log/user2log.csv files.

I have tried the below.

CREATE EXTERNAL TABLE log_data (username string,log_dt string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
tblproperties ("skip.header.line.count"="1");

load data inpath '/data/log/user1log.csv' into table log_data;
load data inpath '/data/log/user2log.csv' into table log_data;

But after loading data into table files are vanishing from HDFS location. But the file we should keep in the HDFS location.

Please help me.

Thanks in advance.

Upvotes: 0

Views: 76

Answers (1)

Gaurang Shah
Gaurang Shah

Reputation: 12900

I don't think it's possible, when you do Load inpath it moves data rather than copying.

However, you have a External Table so you can load data even without using Load inpath

Here's how you can do it.

Specify the location for your Hive Table

CREATE EXTERNAL TABLE log_data (username string,log_dt string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
tblproperties ("skip.header.line.count"="1");
location '/data/log_data/table'

Copy Files to Location

hdfs dfs -cp /data/log/user1log.csv /data/log_data/table/
hdfs dfs -cp /data/log/user2log.csv /data/log_data/table/

Upvotes: 2

Related Questions