user706794
user706794

Reputation: 211

Hive: Table creation with multi-files with multiple directories

I want to create a Hive table where the input textfiles are traversed onto multiple sub-directories in hdfs. So example I have in hdfs:

    /testdata/user/Jan/part-0001
    /testdata/user/Feb/part-0001
    /testdata/user/Mar/part-0001
and so on...

If i want to create a table user in hive, but have it be able to traverse the sub-directories of user, can that be done? I tried something like this, but doesn't work;

CREATE EXTERNAL TABLE users (id int, name string) 
STORED AS TEXTFILE LOCATION '/testdata/user/*'  

I thought adding the wildcard would work but doesn't. When I tried not using wildcard still does not work. However, if I copy the files into the root directory of user, then it works. Is there no way for Hive to traverse to the child-directories, and grab those files?

Upvotes: 17

Views: 26736

Answers (5)

chinmaya Mohanty
chinmaya Mohanty

Reputation: 1

Don't put * after the /testdata/user/ because path hive will take all sub directories automatically.

If you want to make partitions then make the HDFS folder like /testdata/user/year=dynamicyear/month=dynamicmonth/date=dynamicdate

After creating the table with partition then use msck repair table tablename.

CREATE EXTERNAL TABLE users (id int, name string) 
STORED AS TEXTFILE LOCATION '/testdata/user/'

Upvotes: 0

Dhanesh
Dhanesh

Reputation: 1049

I ended up using a shell script like below for a use case where the sub-directories are not known a-priori.

#!/bin/bash
hive -e "CREATE EXTERNAL TABLE users (id int, name string) PARTITIONED BY (month string) STORED AS TEXTFILE LOCATION '/testdata/user/'; "

hscript=""

for part in `hadoop fs -ls /testdata/user/ | grep -v -P "^Found"|grep -o -P "[a-zA-Z]{3}$"`;
do

echo $part
tmp="ALTER TABLE users ADD PARTITION(month='$part');"
hscript=$hscript$tmp
done;

hive -e "$hscript"

Upvotes: 7

OpenGG
OpenGG

Reputation: 4540

You can create an external table, then add subfolders as partitions.

CREATE EXTERNAL TABLE test (id BIGINT) PARTITIONED BY ( yymmdd STRING);
ALTER TABLE test ADD PARTITION (yymmdd = '20120921') LOCATION 'loc1';
ALTER TABLE test ADD PARTITION (yymmdd = '20120922') LOCATION 'loc2';

Upvotes: 26

Kai Zhang
Kai Zhang

Reputation: 1038

CREATE EXTERNAL TABLE user (id int, name string);
LOAD DATA INPATH "/testdata/user/*/*" INTO TABLE users;

Upvotes: 0

Steve Severance
Steve Severance

Reputation: 6646

Hive uses subdirectories as partitions of the data, so simply:

CREATE EXTERNAL TABLE users (id int, name string) PARTITIONED BY (month string)
STORED AS TEXTFILE LOCATION '/testdata/user/'  

That should do it for you.

Upvotes: 5

Related Questions