Puneeth
Puneeth

Reputation: 19

Hive, Bucketing for the partitioned table

This is my script:

--table without partition

drop table if exists ufodata;
create table ufodata ( sighted string, reported string, city string, shape string, duration string, description string )
row format delimited
fields terminated by '\t'
Location '/mapreduce/hive/ufo';

--load my data in ufodata

load data local inpath '/home/training/downloads/ufo_awesome.tsv' into table ufodata;

--create partition table
drop table if exists partufo;
create table partufo ( sighted string, reported string, city string, shape string, duration string, description string )
partitioned by ( year string )
clustered by (year) into 6 buckets
row format delimited
fields terminated by '/t';

--by default dynamic partition is not set
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
--by default bucketing is false
set hive.enforcebucketing=true;

--loading mydata
insert overwrite table partufo
partition (year)
select sighted, reported, city, shape, min, description, SUBSTR(TRIM(sighted), 1,4) from ufodata;

Error message:

FAILED: Error in semantic analysis: Invalid column reference

I tried bucketing for my partitioned table. If I remove "clustered by (year) into 6 buckets" the script works fine. How do I bucket the partitioned table

Upvotes: 0

Views: 1674

Answers (3)

steve
steve

Reputation: 31

when you're doing dynamic partition, create a temporary table with all the columns (including your partitioned column) and load data into temporary table.

create actual partitioned table with partition column. While you are loading data from temporary table the partitioned column should be in the last in the select clause.

Upvotes: 0

madhu
madhu

Reputation: 1170

There is an important thing we should consider while doing bucketing in hive.

The same column name cannot be used for both bucketing and partitioning. The reason is as follows:

Clustering and Sorting happens within a partition. Inside each partition there will be only one value associated with the partition column(in your case it is year)therefore there will not any be any impact on clustering and sorting. That is the reason for your error....

Upvotes: 1

Dinesh K R
Dinesh K R

Reputation: 1

You can use the below syntax to create bucketing table with partition.

CREATE TABLE bckt_movies
(mov_id BIGINT , mov_name STRING ,prod_studio STRING, col_world DOUBLE , col_us_canada DOUBLE , col_uk DOUBLE , col_aus DOUBLE)
PARTITIONED BY (rel_year STRING)
CLUSTERED BY(mov_id) INTO 6 BUCKETS; 

Upvotes: 0

Related Questions