Reputation: 1223
I have created an external Hive table with specified schema but without data, say table A. Now let us say I have CSV files in HDFS dir organized in the following way:
20150718/dir1/dir2/file1.csv
20150718/dir1/dir2/file2.csv
...................
20150718/dir1/dir2/..../dirN/file10000.csv
In other words, the files could be in multiple different level of dirs in the dir 20150718. How to load these CSV files in one Hive/shell command?
Another note is I plan to create partitions based on date as times go on, then how should I proceed? Still a new Hive user, advice is appreciated.
Upvotes: 0
Views: 1308
Reputation: 729
//Get the configuration
Configuration conf = getConf();
FileSystem fs = inputPath.getFileSystem(conf);
//Specify the filter, Dates in your case.
PathFilter pf = new FileFilter(conf, fs, new String[] { "txt" });
//Move or copy recursively
moveRecursivelytoTarget(target, fs, inputPath, pf);
protected void moveRecursivelytoTarget(String target, FileSystem fs, Path path, PathFilter inputFilter)
throws IOException
{
for (FileStatus stat : fs.listStatus(path, inputFilter))
if (stat.isDir())
moveRecursivelytoTarget(target, fs, stat.getPath(), inputFilter);
else
{
fs.copyFromLocalFile(stat.getPath(), target);
//Or rename
//rename(stat.getPath(), target)
}
}
you can follow the same procedure in shell too.
For creating dynamic partition put above collected information into a staging table call it as tableA, Then read from tableA and write to tableMain with parttion and you can clean the tableA for day.
set hive.exec.dynamic.partition=true;
INSERT OVERWRITE TABLE tableMain PARTITION (date) SELECT x,y,z
FROM tableA t;
Upvotes: 1