Carol.Kar
Carol.Kar

Reputation: 5345

Hive - partition by year

I am partitioning by year in hive. I have created a script:

DROP TABLE movies_byYear;

CREATE TABLE movies_byYear (title STRING, full_name STRING, ep_name STRING, type STRING, ep_num STRING, suspended BOOLEAN) PARTITIONED BY (year INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;

INSERT OVERWRITE TABLE movies_byYear PARTITION (year='2013') SELECT title, full_name, ep_name, type, ep_num, suspended FROM movies WHERE year='2013';

However, when using: SELECT COUNT(*) FROM movies WHERE year='2013';

I do not get all movies by year 2013 back, instead I get all movies back.

Is it also possible to let hive decide where to partition?

I really appreciate your answer!!!

UPDATE

When adding year I get:

INSERT OVERWRITE TABLE movies_byYear PARTITION (year=2013) SELECT title, full_name, ep_name, type, ep_num, suspended, year FROM movies WHERE year=2013;

FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different '2013': Table insclause-0 has 6 columns, but query has 7 columns.

Upvotes: 0

Views: 3615

Answers (1)

rkh
rkh

Reputation: 863

When inserting, you insert:

SELECT title, full_name, ep_name, type, ep_num, suspended

Add year to that... Currently your year field in movies_byYear is null...

When you specify partition by year in your create table statement in hive, year will be a column in the table!!!

UPDATE

Replace this

INSERT OVERWRITE TABLE movies_byYear PARTITION (year='2013') SELECT title, full_name, ep_name, type, ep_num, suspended FROM movies WHERE year='2013';

with this:

INSERT OVERWRITE TABLE movies_byYear PARTITION (year=2013) SELECT title, full_name, ep_name, type, ep_num, suspended FROM movies WHERE year='2013';

Namely, remove the single quotes around the year value in Partition...

Upvotes: 2

Related Questions