Reputation: 783
I have data in one Hive table and would like to load data into another hive table.
The source table is reg_logs which has 2 partitions, date and hour. The data gets loaded into this table hourly. The schema is:
CREATE EXTERNAL TABLE IF NOT EXISTS reg_logs (
id int,
region_code int,
count int
)
PARTITIONED BY (utc_date STRING, utc_hour STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
LOCATION '/ad_data/raw/reg_logs';
The destination table is reg_logs_org all I would like to do is copy all data from reg_logs beside utc_hour column.
Schema I created is: (please correct if I am wrong)
CREATE EXTERNAL TABLE IF NOT EXISTS reg_logs_org (
id int,
region_code int,
count int
)
PARTITIONED BY (utc_date STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
LOCATION '/ad_data/reg_logs_org';
Insert data into reg_logs_org from reg_logs:
insert overwrite table reg_logs_org
select id, region_code, sum(count), utc_date
from
reg_logs
group by
utc_date, id, region_code
error message:
FAILED: SemanticException 1:23 Need to specify partition columns because the destination table is partitioned. Error encountered near token 'reg_logs_org'
==
Thank you,
Rio
Upvotes: 20
Views: 78391
Reputation: 13
This will not work if the first partition is in the source table is empty , I mean there are no records in the first partition of source table. In this case I suggest to insert a dummy record along with the first partition in a separate insert script and truncate that data later.
Upvotes: 0
Reputation: 27793
Create a copy of your table
CREATE TABLE my_table_backup LIKE my_table;
Enable dynamic partitioning
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
SET hive.mapred.mode = nonstrict;
Copy the table
INSERT OVERWRITE TABLE my_table_backup PARTITION (ds)
SELECT * FROM my_table
WHERE ds = ds;
The where clause is needed if you use strict mode.
Upvotes: 28
Reputation: 993
It is mandatory to use partitioned column as last column while inserting the data. Hive will take the data which is there in last column.
So based on that insert query should be:-
insert overwrite table reg_logs_org PARTITION (utc_date)
select id, region_code, sum(count), utc_date
from
reg_logs
group by
utc_date, id, region_code
From the documentation:
The dynamic partition columns must be specified last among the columns in the SELECT statement and in the same order in which they appear in the PARTITION() clause
Upvotes: 3
Reputation: 14871
In some cases you may need to set hive.exec.dynamic.partition.mode=nonstrict to be able to insert data into partitioned table, for example,
CREATE TABLE hivePartitionedTable
(
c1 int
, c2 int
, c3 string
)
PARTITIONED BY (year int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS SEQUENCEFILE
;
then this INSERT will work:
set hive.exec.dynamic.partition.mode=nonstrict;
INSERT INTO hivePartitionedTable PARTITION (year)
VALUES (1,2,'3', 1999);
Upvotes: 6
Reputation: 5450
this is because you are missing the partition info in your insert query
insert overwrite table reg_logs_org PARTITION (utc_date)
select id, region_code, sum(count), utc_date
from
reg_logs
group by
utc_date, id, region_code
Upvotes: 27