user1956609
user1956609

Reputation: 2202

Create External Table atop pre-partitioned data

I have data that looks like this:

/user/me/output/
  key1/
    part_00000
    part_00001
  key2/
    part_00000
    part_00001
  key3/
    part_00000
    part_00001

The data is pre-partitioned by "key_", and the "part_*" files contains my data in the form "a,b,key_". I create an external table:

CREATE EXTERNAL TABLE tester (
  a STRING,
  b INT
)
PARTITIONED BY (key STRING)
ROW FORMAT 
DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/me/output/';

But a SELECT * gives no output. How can I create an external table that will read in this partitioned data?

Upvotes: 2

Views: 244

Answers (1)

Amar
Amar

Reputation: 3845

You will have to change your directory structure to make sure that hive reads the folders. It should be something like this.

/user/me/output/
  key=key1/
    part_00000
    part_00001
  key=key2/
    part_00000
    part_00001
  key=key3/
    part_00000
    part_00001

Once this is done you can create a table on top of this using the query you mentioned.

CREATE EXTERNAL TABLE tester (
  a STRING,
  b INT
)
PARTITIONED BY (key STRING)
ROW FORMAT 
DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/me/output/';

You will also have to explicitly add the partitions or do a msck repair on the table to load the partitions with hive metadata. Any of these would do:

msck repair table tester;

OR

Alter table tester ADD PARTITION (key = 'key1');
Alter table tester ADD PARTITION (key = 'key2');
Alter table tester ADD PARTITION (key = 'key3');

Once you have done this, queries would return the output as present in your folders.

Upvotes: 2

Related Questions