Reputation: 6756
I am taking a mooc.
it told us to upload a few files from our PC to hdfs using below commands
azure storage blob upload local_path container data/logs/2008-01.txt.gz
I did the same. later on when I typed below command in PUTTY secure shell I was able to see that file
hdfs dfs -ls /data/logs
Found 6 items
-rwxrwxrwx 1 331941 2016-03-03 15:56 /data/logs/2008-01.txt.gz
-rwxrwxrwx 1 331941 2016-03-03 15:58 /data/logs/2008-02.txt.gz
-rwxrwxrwx 1 331941 2016-03-03 15:58 /data/logs/2008-03.txt.gz
-rwxrwxrwx 1 331941 2016-03-03 15:58 /data/logs/2008-04.txt.gz
-rwxrwxrwx 1 331941 2016-03-03 15:58 /data/logs/2008-05.txt.gz
-rwxrwxrwx 1 331941 2016-03-03 15:58 /data/logs/2008-06.txt.gz
then we started a hive
terminal and first created a table and then inserted data into that table using
load data inpath '/data/logs' into TABLE rawlog;
Then we created an external table using below command
CREATE EXTERNAL TABLE cleanlog
(log_date DATE,
log_time STRING,
c_ip STRING,
cs_username STRING,
s_ip STRING,
s_port STRING,
cs_method STRING,
cs_uri_stem STRING,
cs_uri_query STRING,
sc_status STRING,
sc_bytes INT,
cs_bytes INT,
time_taken INT,
cs_user_agent STRING,
cs_referrer STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
STORED AS TEXTFILE LOCATION '/data/cleanlog';
we inserted data into the table using
INSERT INTO TABLE cleanlog
SELECT *
FROM rawlog
WHERE SUBSTR(log_date, 1, 1) <> '#';
I exited out of hive and typed in below command
hdfs dfs -ls /data/logs
Why do i see file 00000_0 in my cleanlog folder? is it the new table? If i type command
hdfs dfs -ls /data/cleanlog
The output that i get is
Found 1 items
-rwxr-xr-x 1 sshuser supergroup 71323206 2016-03-03 16:11 /data/cleanlog/000000_0
################----------------------------------update 1
/data/logs/
and
then run select * from rawlog
? would it automatically pull data
from the new file?Upvotes: 2
Views: 258
Reputation: 38910
If you don't want to lose data in source folder, use external table. Have a look at this SE question:
Difference between `load data inpath ` and `location` in hive?
- I dont see anything in that folder, why? where did uploaded log files go?
They have been removed as data is loaded in table and you have used load data in path instead of external table
- Where is the rawlog table? does it exist in the same folder? Why dont i see it?
Table definition does not exists in the folder where data resides. In your create table statement, you have already quoted the location of table data to be stored as /data/cleanlog
Have a look at below queries on where does hive stores files in hdfs.
Where does Hive store files in HDFS?
I have created a table in hive, I would like to know which directory my table is created in?
- Why do i see file 00000_0 in my cleanlog folder? is it the new table?
It's not new table. Execute this command in hive shell.
describe formatted <table_name>;
EDIT: Regarding incremental updates to table,follow the steps as per this article and this question : Delta/Incremental Load in Hive
Upvotes: 2
Reputation: 11
You used the LOAD
command, which MOVED the files from their original location to the folder for the rawlog table (which by default will be /hive/warehouse/rawlog
).
Upvotes: 1