Reputation: 12900
I was working on one solution and found that in some particular cases, hive insert overwrite
truncates the table however in few cases it doesn't. Would someone please explain me what it's behaving like that?
to explain this, I am table two tables, source and target and trying to insert data into master from source table using insert overwrite
When Source Table has partition
if source table has partition and if you write a condition such that partition does not exist then it won't truncate the master table.
create table source (name String) partitioned by (age int);
insert into source partition (age) values("gaurang", 11);
create table target (name String, age int);
insert into target partition (age) values("xxx", 99);
following query won't truncate the table even if select doesn't return anything.
insert overwrite table temp.test12 select * from temp.test11 where name="Ddddd" and age=99;
However, following query will truncate the table.
insert overwrite table temp.target select * from temp.test11 where name="Ddddd" and age=11;
it makes sense in the first case, as the partition(age=99) does not exist hence it should stop the execution of the query further. However this is my assumption, not sure what exactly happens.
When Source Table Doesn't have partition, but Target has in this case target table won't be truncated even if select statement from source table returns 0 rows.
use temp;
drop table if exists source1;
drop table if exists target1;
create table source1 (name String, age int);
create table target1 (name String) partitioned by (age int);
insert into source1 values ("gaurang", 11);
insert into target1 partition(age) values("xxx", 99);
select * from source1;
select * from target1;
Following query won't truncate the table even if no data found in select statement.
insert overwrite table temp.target1 partition(age) select * from temp.source1 where age=90;
When Source or Target don't have partition
In this case if I try to insert overwrite target and select statement doesn't return any row then target table will be truncated. check the example below.
use temp;
drop table if exists source1;
drop table if exists target1;
create table source1 (name String, age int);
create table target1 (name String, age int);
insert into source1 values ("gaurang", 11);
insert into target1 values("xxx", 99);
select * from source1;
select * from target1;
Following Query will truncate the target table.
insert overwrite table temp.target1 select * from temp.source1 where age=90;
Upvotes: 2
Views: 4006
Reputation: 38290
Better use term 'overwrite'
instead of truncate
, because it is what exactly happening during insert overwrite
.
When you write overwrite table temp.target1 partition(age)
you instructs Hive to overwrite partitions, not all the target1 table, only those partitions which will be returned by select.
Empty dataset will not overwrite partitions in dynamic partition mode. because the partition to overwrite is unknown, partition should be taken from dataset, and the dataset is empty, nothing to overwrite then.
And in case of not partitioned table, it is already known that it should overwrite all the table, does not matter, empty dataset or not.
Partition column in insert overwrite
statement should be the last. And the list of partitions to be overwritten in target = list of values in partition column, returned by dataset, does not matter how the source table is partitioned (you can select target partition column from any source table column, calculate it or use a constant), only what was returned does matter.
Upvotes: 1