Reputation: 23
Suppose on HDFS I have file with following content: data1-2018-01-01.txt
, data1-2018-01-02.txt
, data1-2018-01-03.txt
, data1-2018-01-04.txt
, data1-2018-01-06.txt
Now I want to query files based on date:
select * from mytable where date > 2018-01-03 and date < 2018-01-06 ;
And my question: is it possible to create an external table just on these files satisfying my query? Or maybe you have any workaround?
I know, I could use partitions but they require to fetch the data manually when the new data set arrives.
Upvotes: 1
Views: 148
Reputation: 38290
Put those file into a directory and create new table on top of it. Also Hive has INPUT__FILE__NAME virtual column, you can use it for filtering:
where INPUT__FILE__NAME like '%2018-01-03%'
Also it is possible to use substr or regexp_extract to get date from filename , then use IN
or >, <
to filter them.
Upvotes: 1