Reputation: 97
Can we make a table having both partitioning and bucketing in hive ?
Upvotes: 0
Views: 2945
Reputation: 3261
Yes.This is straight forward.
try something below:
CREATE TABLE IF NOT EXISTS employee_partition_bucket
(
employeeID Int,
firstName String,
designation String,
salary Int
)
PARTITIONED BY (department string)
CLUSTERED BY (designation) INTO 2 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';
In this example I Have created partition by department and bucket by designation
Hopw this will help you
Upvotes: 1
Reputation: 36
you can !! In that case, you will be having buckets inside partitioned data !
Upvotes: 0
Reputation: 1231
Yes.
Partitioning is you data is divided into number of directories on HDFS. Each directory is a partition. For example, if your table definition is like
CREATE TABLE user_info_bucketed(user_id BIGINT, firstname STRING, lastname STRING)
COMMENT 'A bucketed copy of user_info'
PARTITIONED BY(ds STRING)
CLUSTERED BY(user_id) INTO 256 BUCKETS;
Then you'll have directories on hdfs like
/user/hive/warehouse/user_info_bucketed/ds=2011-01-11/
/user/hive/warehouse/user_info_bucketed/ds=2011-01-12/
/user/hive/warehouse/user_info_bucketed/ds=2011-01-13/
Bucketing is about how your data is distributed inside a partition, So you'll have files on hdfs like
/user/hive/warehouse/user_info_bucketed/ds=2011-01-11/000000_0
/user/hive/warehouse/user_info_bucketed/ds=2011-01-11/000000_1
...
/user/hive/warehouse/user_info_bucketed/ds=2011-01-11/000000_255
/user/hive/warehouse/user_info_bucketed/ds=2011-01-12/000000_0
/user/hive/warehouse/user_info_bucketed/ds=2011-01-12/000000_1
...
/user/hive/warehouse/user_info_bucketed/ds=2011-01-12/000000_255
Reference: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables http://www.hadooptpoint.com/hive-buckets-optimization-techniques/
Upvotes: 2