Jin
Jin

Reputation: 1223

How to load multiple CSV tables recursively in one directory into Hive

I have created an external Hive table with specified schema but without data, say table A. Now let us say I have CSV files in HDFS dir organized in the following way:

20150718/dir1/dir2/file1.csv
20150718/dir1/dir2/file2.csv
...................
20150718/dir1/dir2/..../dirN/file10000.csv

In other words, the files could be in multiple different level of dirs in the dir 20150718. How to load these CSV files in one Hive/shell command?

Another note is I plan to create partitions based on date as times go on, then how should I proceed? Still a new Hive user, advice is appreciated.

Upvotes: 0

Views: 1308

Answers (1)

rbyndoor
rbyndoor

Reputation: 729

//Get the configuration

Configuration conf = getConf();
FileSystem fs = inputPath.getFileSystem(conf);

//Specify the filter, Dates in your case.

PathFilter pf = new FileFilter(conf, fs, new String[] { "txt" });

//Move or copy recursively

moveRecursivelytoTarget(target, fs, inputPath, pf);

protected void moveRecursivelytoTarget(String target, FileSystem fs, Path path, PathFilter inputFilter)
    throws IOException
  {
    for (FileStatus stat : fs.listStatus(path, inputFilter))
      if (stat.isDir())
        moveRecursivelytoTarget(target, fs, stat.getPath(), inputFilter);
      else
      {
        fs.copyFromLocalFile(stat.getPath(), target);
        //Or rename
        //rename(stat.getPath(), target) 
      }
 }

you can follow the same procedure in shell too.

For creating dynamic partition put above collected information into a staging table call it as tableA, Then read from tableA and write to tableMain with parttion and you can clean the tableA for day.

set hive.exec.dynamic.partition=true; 
INSERT OVERWRITE TABLE tableMain PARTITION (date) SELECT x,y,z 
FROM tableA t;

Upvotes: 1

Related Questions