user720694
user720694

Reputation: 2075

Hive date based partitions

I have data in the following form on HDFS:-

/basedir/yyyymmdd/fixedname/files

Where the yyyymmdd is the date folder and files are the list of files added in the directory. I need a table in hive to pick up data from yyyymmdd/fixedname directory. This should also work when i added a new date. e.g. i add something on 5th March 2013 so all files added on that day would go to 20130305/fixedname folder. On 6th March 2013, all files would go to 20130306/fixedname folder.

How do i alter a hive table to pickup data from the changing date but fixed folder within it?

Upvotes: 0

Views: 1569

Answers (1)

dbustosp
dbustosp

Reputation: 4478

Do you have a partitioned table? Let's say that you already have a partitioned table by the column date and you want to add new data. In this case, you will have to add the data to the new directory and tell to hive table (specifically to the metastore) that it has a new partition using ALTER TABLE ADD PARTITION COMMAND.

Let's say that you do have not created any table yet. In this case you will have to create a partitioned table and then insert the data into this table from queries. The magic comes up when you set these two flags:

set hive.exec.dynamic.partition=yes
set hive.exec.dynamic.partition.mode = nonstrict;

These flags allow dynamic partitions (For more details read here).

Remember that you will have directories like:

/date=YYYYMMDD/fixedname/files

So you have to tell to Hive to pick up all the data into subdirectories in a recursive way. You should set the following flag (here there is a better explanation)

SET mapred.input.dir.recursive=true;

Finally you will able to make queries by date and get all the data in the subdirectories from the date you specified in the query (/date=YYYYMMDD/...).

Hope this helps you.

Upvotes: 1

Related Questions