user182944
user182944

Reputation: 8067

HIVE: Empty buckets getting created after partitioning in HDFS

I was trying to create Partition and buckets using HIVE.

For setting some of the properties:

set hive.enforce.bucketing = true;
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;

Below is the code for creating the table:

CREATE TABLE transactions_production
( id string,
dept string,
category string,
company string,
brand string,
date1 string,
productsize int,
productmeasure string,
purchasequantity int,
purchaseamount double)
PARTITIONED BY (chain string) clustered by(id) into 5 buckets
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

Below is the code for inserting data into the table:

INSERT OVERWRITE TABLE transactions_production PARTITION (chain)
select id, dept, category, company, brand, date1, productsize, productmeasure,
purchasequantity, purchaseamount, chain from transactions_staging;

What went wrong:

Partitions and buckets are getting created in HDFS but the data is present only in the 1st bucket of all the partitions; all the remaining buckets are empty.

Please let me know what i did wrong and how to resolve this issue.

Upvotes: 0

Views: 890

Answers (1)

madhu
madhu

Reputation: 1170

When using bucketing, Hive comes up with a hash of the clustered by value (here you use id) and splits the table into that many flat files inside partitions.

Because the table is split up by the hashes of the id's the size of each split is based on the values in your table.

If you have no values that would get mapped to the buckets other than the first bucket, all those flat files will be empty.

Upvotes: 1

Related Questions