Reputation: 110
I am new to hadoop ecosystem. I was trying to create hive table from CSV file using the below query.
CREATE EXTERNAL TABLE IF NOT EXISTS proxy_data(
date_time TIMESTAMP,time_taken INT, c_ip STRING,
sc_status INT, s_action STRING, sc_bytes INT,
cs_bytes INT, cs_method STRING, cs_uri STRING,
cs_host STRING, uri_port INT, uri_path STRING,
uri_query STRING, username STRING, auth STRING,
supplier_name STRING, content_type STRING, referer STRING,
user_agent STRING, filter_result STRING, categories STRING,
x_virus_id STRING, proxy_ip STRING
)
COMMENT 'Proxy logs'
LOCATION '/user/admin'
tblproperties ("skip.header.line.count"="1");
This query actually created a table proxy_data and populated the values present in the csv files which are located in the prescribed location.
Now, I want to append values from another set of CSVs to the same table(it should skip the headings present in the csv file). I checked for various solutions, but nothing is meeting my need.
Upvotes: 2
Views: 8480
Reputation: 314
You may follow this approach:
skip.header.line.count
clause in this table).Then, load append staging table's data into the main table.
create table <my_table_stg>(col1 data_type1, col2, data_type2...)
row format delimited fields terminated by ','
tblproperties ("skip.header.line.count"="1");
create table <my_table>(col1 data_type1, col2, data_type2...);
load data inpath '/file/location/my_file.csv' overwrite into table <my_table_stg>;
insert into table <my_table> select * from <my_table_stg>;
P.S: Your table syntax doesn't haverow format delimited
clause. Please make sure you add it as shown above
Upvotes: 2
Reputation: 658
You can add a property to your table which will skip the first row of csv. "skip.header.line.count"="1"
In your case,
Alter Table proxy_data SET TBLPROPERTIES ("skip.header.line.count"="1").
Upvotes: 2
Reputation: 4797
Could you try this:
hive> LOAD DATA LOCAL INPATH '/home/yourcsvfile.csv' INTO TABLE proxy_data;
Upvotes: 0