Reputation: 5345
I am partitioning by year in hive. I have created a script:
DROP TABLE movies_byYear;
CREATE TABLE movies_byYear (title STRING, full_name STRING, ep_name STRING, type STRING, ep_num STRING, suspended BOOLEAN) PARTITIONED BY (year INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;
INSERT OVERWRITE TABLE movies_byYear PARTITION (year='2013') SELECT title, full_name, ep_name, type, ep_num, suspended FROM movies WHERE year='2013';
However, when using: SELECT COUNT(*) FROM movies WHERE year='2013';
I do not get all movies by year 2013 back, instead I get all movies back.
Is it also possible to let hive decide where to partition?
I really appreciate your answer!!!
UPDATE
When adding year
I get:
INSERT OVERWRITE TABLE movies_byYear PARTITION (year=2013) SELECT title, full_name, ep_name, type, ep_num, suspended, year FROM movies WHERE year=2013;
FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different '2013': Table insclause-0 has 6 columns, but query has 7 columns.
Upvotes: 0
Views: 3615
Reputation: 863
When inserting, you insert:
SELECT title, full_name, ep_name, type, ep_num, suspended
Add year to that... Currently your year
field in movies_byYear is null...
When you specify partition by year
in your create table statement in hive, year
will be a column in the table!!!
UPDATE
Replace this
INSERT OVERWRITE TABLE movies_byYear PARTITION (year='2013') SELECT title, full_name, ep_name, type, ep_num, suspended FROM movies WHERE year='2013';
with this:
INSERT OVERWRITE TABLE movies_byYear PARTITION (year=2013) SELECT title, full_name, ep_name, type, ep_num, suspended FROM movies WHERE year='2013';
Namely, remove the single quotes around the year value in Partition...
Upvotes: 2