newbie
newbie

Reputation: 321

how to load load multiple files into table in hive?

There is a directory which contains multiple files yet to be analyzed, for example, file1, file2, file3.

I want to

load data inpath 'path/to/*' overwrite into table demo

instead of

load data inpath 'path/to/file1' overwrite into table demo
load data inpath 'path/to/file2' overwrite into table demo
load data inpath 'path/to/file3' overwrite into table demo.

However, it just doesn't work. Are there any easier ways to implement this?

Upvotes: 1

Views: 8298

Answers (4)

Ravi
Ravi

Reputation: 1

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

filepath can be:

a relative path, such as project/data1 an absolute path, such as /user/hive/project/data1 a full URI with scheme and (optionally) an authority, such as hdfs://namenode:9000/user/hive/project/data1

The target being loaded to can be a table or a partition. If the table is partitioned, then one must specify a specific partition of the table by specifying values for all of the partitioning columns.

filepath can refer to a file (in which case Hive will move the file into the table) or it can be a directory (in which case Hive will move all the files within that directory into the table). In either case, filepath addresses a set of files.

Upvotes: 0

y durga prasad
y durga prasad

Reputation: 1202

1) Directory contains three files

-rw-r--r--   1 hadoop supergroup        125 2017-05-15 17:53 /hallfolder/hall.csv
-rw-r--r--   1 hadoop supergroup        125 2017-05-15 17:53 /hallfolder/hall1.csv
-rw-r--r--   1 hadoop supergroup        125 2017-05-15 17:54 /hallfolder/hall2.csv

2) Enable this command

  SET mapred.input.dir.recursive=true;

3) hive>

load data  inpath '/hallfolder/*' into table alltable;

Upvotes: 0

Ido Amit
Ido Amit

Reputation: 41

Generating a hive table with the path as the LOCATION parameter will automatically read all the files in said location. for example:

CREATE [EXTERNAL] TABLE db.tbl(
column1 string,
column2 int ...)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY (delimiter)
LINES TERMINATED BY '\n'
LOCATION '/path/to/' <- DO NOT POINT TO A SPECIFIC FILE, POINT TO THE DIRECTORY

Hive will will automatically parse all data within the folder and will "force feed" it to the table statement you created. as long as all files in that path are in the same format you are good to go.

Upvotes: 1

David דודו Markovitz
David דודו Markovitz

Reputation: 44941

1.

load data inpath is an HDFS metadata operation.
The only thing it does is moving files from their current location to the table location.
And again, "moving" (unlike "copying") is a metadata operation and not data operation.

2.

If the OVERWRITE keyword is used then the contents of the target table (or partition) will be deleted and replaced by the files referred to by filepath; otherwise the files referred by filepath will be added to the table.

Language Manual DML-Loading files into tables

3.

load data inpath 'path/to/file1' into table demo;
load data inpath 'path/to/file2' into table demo;
load data inpath 'path/to/file3' into table demo;

or

load data inpath 'path/to/file?' into table demo;

or

dfs -mv path/to/file? ...{path to demo}.../demo

or (from bash)

hdfs dfs -mv path/to/file? ...{path to demo}.../demo

Upvotes: 3

Related Questions