Reputation: 7951
I have a set of CSV files in a HDFS path and I created an external Hive table, let's say table_A, from these files. Since some of the entries are redundant, I tried creating another Hive table based on table_A, say table_B, which has distinct records. I was able to create table_B as a non-external table (Hive warehouse). I want to know if I can create table_B as an external table? If this is possible, will it copy the records from table_A and create its own table_B store on a specified path (preferably as CSV also)?
Upvotes: 0
Views: 2184
Reputation: 729
I am presuming you want to select distinct data from "uncleaned" table and insert into "cleaned" table.
CREATE EXTERNAL TABLE `uncleaned`( `a` int, `b` string, `c` string, `d` string, `e` bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/external/uncleaned'
create another table, it can be external or not(doesn't matter).
CREATE EXTERNAL TABLE `cleaned`( `a` int, `b` string, `c` string, `d` string, `e` bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/external/cleaned'
Read from first table and you can insert it by
insert overwrite table cleaned select distinct a,b,c,d,e from uncleaned;
Upvotes: 2