Reputation: 211
I want to create a Hive table where the input textfiles are traversed onto multiple sub-directories in hdfs. So example I have in hdfs:
/testdata/user/Jan/part-0001
/testdata/user/Feb/part-0001
/testdata/user/Mar/part-0001
and so on...
If i want to create a table user in hive, but have it be able to traverse the sub-directories of user, can that be done? I tried something like this, but doesn't work;
CREATE EXTERNAL TABLE users (id int, name string)
STORED AS TEXTFILE LOCATION '/testdata/user/*'
I thought adding the wildcard would work but doesn't. When I tried not using wildcard still does not work. However, if I copy the files into the root directory of user, then it works. Is there no way for Hive to traverse to the child-directories, and grab those files?
Upvotes: 17
Views: 26736
Reputation: 1
Don't put * after the /testdata/user/ because path hive will take all sub directories automatically.
If you want to make partitions then make the HDFS folder like /testdata/user/year=dynamicyear/month=dynamicmonth/date=dynamicdate
After creating the table with partition then use msck repair table tablename.
CREATE EXTERNAL TABLE users (id int, name string)
STORED AS TEXTFILE LOCATION '/testdata/user/'
Upvotes: 0
Reputation: 1049
I ended up using a shell script like below for a use case where the sub-directories are not known a-priori.
#!/bin/bash
hive -e "CREATE EXTERNAL TABLE users (id int, name string) PARTITIONED BY (month string) STORED AS TEXTFILE LOCATION '/testdata/user/'; "
hscript=""
for part in `hadoop fs -ls /testdata/user/ | grep -v -P "^Found"|grep -o -P "[a-zA-Z]{3}$"`;
do
echo $part
tmp="ALTER TABLE users ADD PARTITION(month='$part');"
hscript=$hscript$tmp
done;
hive -e "$hscript"
Upvotes: 7
Reputation: 4540
You can create an external table, then add subfolders as partitions.
CREATE EXTERNAL TABLE test (id BIGINT) PARTITIONED BY ( yymmdd STRING);
ALTER TABLE test ADD PARTITION (yymmdd = '20120921') LOCATION 'loc1';
ALTER TABLE test ADD PARTITION (yymmdd = '20120922') LOCATION 'loc2';
Upvotes: 26
Reputation: 1038
CREATE EXTERNAL TABLE user (id int, name string);
LOAD DATA INPATH "/testdata/user/*/*" INTO TABLE users;
Upvotes: 0
Reputation: 6646
Hive uses subdirectories as partitions of the data, so simply:
CREATE EXTERNAL TABLE users (id int, name string) PARTITIONED BY (month string)
STORED AS TEXTFILE LOCATION '/testdata/user/'
That should do it for you.
Upvotes: 5