Reputation: 165
I would like to insert the pig output into Hive tables(tables in Hive is already created with the exact schema).Just need to insert the output values into table. I dont want to the usual method, wherein I first store into a file, then read that file from Hive and then insert into tables. I need to reduce that extra hop which is done.
Is it possible. If so please tell me how this can be done ?
Thanks
Upvotes: 5
Views: 12643
Reputation: 1
The best way is to use HCatalog and write the data in hive table.
STORE final_data INTO 'Hive_table_name' using org.apache.hive.hcatalog.pig.HCatStorer();
But before storing the data, make sure the columns in the 'final_data' dataset is perfectly matched and mapped with the schema of the table.
And run your pig script like this : pig script.pig -useHCatalog
Upvotes: 0
Reputation: 352
you can store it using Hcatalog
STORE D INTO 'tablename' USING org.apache.hive.hcatalog.pig.HCatStorer();
see below link https://acadgild.com/blog/loading-and-storing-hive-data-into-pig
Upvotes: 0
Reputation: 2415
There are two approaches explained below with 'Employee' table example to store pig output into hive table. (Prerequisite is that hive table should be already created)
A = LOAD 'EMPLOYEE.txt' USING PigStorage(',') AS(EMP_NUM:int,EMP_NAME:chararray,EMP_PHONE:int);
Approach 1: Using Hcatalog
// dump pig result to Hive using Hcatalog
store A into 'Empdb.employee' using org.apache.hive.hcatalog.pig.HCatStorer();
(or)
Approach 2: Using HDFS physical location
// dump pig result to external hive warehouse location
STORE A INTO 'hdfs://<<nmhost>>:<<port>>/user/hive/warehouse/Empdb/employee/' USING PigStorage(',')
;
Upvotes: 3
Reputation: 3599
Ok. Create a external hive table with a schema layout somewhere in HDFS directory. Lets say
create external table emp_records(id int,
name String,
city String)
row formatted delimited
fields terminated by '|'
location '/user/cloudera/outputfiles/usecase1';
Just create a table like above and no need to load any file into that directory.
Now write a Pig script that we read data for some input directory and then when you store the output of that Pig script use as below
A = LOAD 'inputfile.txt' USING PigStorage(',') AS(id:int,name:chararray,city:chararray);
B = FILTER A by id > = 678933;
C = FOREACH B GENERATE id,name,city;
STORE C INTO '/user/cloudera/outputfiles/usecase1' USING PigStorage('|');
Ensure that destination location and delimiter and schema layout of final FOREACH statement in you Pigscript matches with Hive DDL schema.
Upvotes: 6