Akash
Akash

Reputation: 110

Insert values from file to an existing table on hive

I am new to hadoop ecosystem. I was trying to create hive table from CSV file using the below query.

CREATE EXTERNAL TABLE IF NOT EXISTS proxy_data(
  date_time TIMESTAMP,time_taken INT, c_ip STRING,
  sc_status INT, s_action STRING, sc_bytes INT,
  cs_bytes INT, cs_method STRING, cs_uri STRING,
  cs_host STRING, uri_port INT, uri_path STRING,
  uri_query STRING, username STRING, auth STRING,
  supplier_name STRING, content_type STRING, referer STRING,
  user_agent STRING, filter_result STRING, categories STRING,
  x_virus_id STRING, proxy_ip STRING
)
COMMENT 'Proxy logs' 
LOCATION '/user/admin'
tblproperties ("skip.header.line.count"="1");

This query actually created a table proxy_data and populated the values present in the csv files which are located in the prescribed location.

Now, I want to append values from another set of CSVs to the same table(it should skip the headings present in the csv file). I checked for various solutions, but nothing is meeting my need.

Upvotes: 2

Views: 8480

Answers (3)

Satya Pavan
Satya Pavan

Reputation: 314

You may follow this approach:

  1. Create a staging table (temporary table) with this property - skip.header.line.count=1
  2. Create a main table with same schema (no need to use skip.header.line.count clause in this table).
  3. Every time you have a new file, load overwrite into the staging table
  4. Then, load append staging table's data into the main table.

    create table <my_table_stg>(col1 data_type1, col2, data_type2...)
    row format delimited fields terminated by ','
    tblproperties ("skip.header.line.count"="1");
    
    create table <my_table>(col1 data_type1, col2, data_type2...);
    
    load data inpath '/file/location/my_file.csv' overwrite into table <my_table_stg>;
    
    insert into table <my_table> select * from <my_table_stg>;
    

P.S: Your table syntax doesn't haverow format delimited clause. Please make sure you add it as shown above

Upvotes: 2

Mahesh Mogal
Mahesh Mogal

Reputation: 658

You can add a property to your table which will skip the first row of csv. "skip.header.line.count"="1"

In your case,

Alter Table proxy_data SET TBLPROPERTIES ("skip.header.line.count"="1").

Upvotes: 2

HISI
HISI

Reputation: 4797

Could you try this:

hive>  LOAD DATA LOCAL INPATH '/home/yourcsvfile.csv' INTO TABLE proxy_data;

Upvotes: 0

Related Questions