Reputation: 321
There is a directory which contains multiple files yet to be analyzed, for example, file1, file2, file3.
I want to
load data inpath 'path/to/*' overwrite into table demo
instead of
load data inpath 'path/to/file1' overwrite into table demo
load data inpath 'path/to/file2' overwrite into table demo
load data inpath 'path/to/file3' overwrite into table demo
.
However, it just doesn't work. Are there any easier ways to implement this?
Upvotes: 1
Views: 8298
Reputation: 1
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
filepath can be:
The target being loaded to can be a table or a partition. If the table is partitioned, then one must specify a specific partition of the table by specifying values for all of the partitioning columns.
filepath can refer to a file (in which case Hive will move the file into the table) or it can be a directory (in which case Hive will move all the files within that directory into the table). In either case, filepath addresses a set of files.
Upvotes: 0
Reputation: 1202
1) Directory contains three files
-rw-r--r-- 1 hadoop supergroup 125 2017-05-15 17:53 /hallfolder/hall.csv
-rw-r--r-- 1 hadoop supergroup 125 2017-05-15 17:53 /hallfolder/hall1.csv
-rw-r--r-- 1 hadoop supergroup 125 2017-05-15 17:54 /hallfolder/hall2.csv
2) Enable this command
SET mapred.input.dir.recursive=true;
3) hive>
load data inpath '/hallfolder/*' into table alltable;
Upvotes: 0
Reputation: 41
Generating a hive table with the path as the LOCATION parameter will automatically read all the files in said location. for example:
CREATE [EXTERNAL] TABLE db.tbl(
column1 string,
column2 int ...)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY (delimiter)
LINES TERMINATED BY '\n'
LOCATION '/path/to/' <- DO NOT POINT TO A SPECIFIC FILE, POINT TO THE DIRECTORY
Hive will will automatically parse all data within the folder and will "force feed" it to the table statement you created. as long as all files in that path are in the same format you are good to go.
Upvotes: 1
Reputation: 44941
load data inpath
is an HDFS metadata operation.
The only thing it does is moving files from their current location to the table location.
And again, "moving" (unlike "copying") is a metadata operation and not data operation.
If the OVERWRITE keyword is used then the contents of the target table (or partition) will be deleted and replaced by the files referred to by filepath; otherwise the files referred by filepath will be added to the table.
load data inpath 'path/to/file1' into table demo;
load data inpath 'path/to/file2' into table demo;
load data inpath 'path/to/file3' into table demo;
or
load data inpath 'path/to/file?' into table demo;
or
dfs -mv path/to/file? ...{path to demo}.../demo
or (from bash)
hdfs dfs -mv path/to/file? ...{path to demo}.../demo
Upvotes: 3