Reputation: 3071
I have a pig relation like below:
FINAL= {input_md5::type: chararray,input_md5::name: chararray,input_md5::id: long,input_md5::age: chararray,test_1:: type: chararray,test_2::name:chararray}
I am trying to store all columns for input_md5
relation to a hive table.
like all input_md5::type: chararray,input_md5::name: chararray,input_md5::id: long,input_md5::age: chararray
not taking test_1:: type: chararray,test_2::name:chararray
is there any command in pig which filters only columns of input_md5
.Something like below:
STORE= FOREACH FINAL GENERATE all input_md5::type .
I know that pig have :
FOREACH FINAL GENERATE all input_md5::type as type
syntax, but i have many columns so I cannot use as
in my code.
Because when i try:
STORE= FOREACH FINAL GENERATE input_md5::type .. bus_input_md5::name;
Pig throws an error:
org.apache.hive.hcatalog.common.HCatException : 2007 : Invalid column position in partition schema : Expected column <type> at position 1, found column <input_md5::type>
Thanks in advance,
Upvotes: 1
Views: 1238
Reputation: 1280
I implemented Neethu's example this way. May have typos, but it shows how to implement this idea.
tableA = LOAD 'default.tableA' USING org.apache.hive.hcatalog.pig.HCatLoader();
tableB = LOAD 'default.tableB' USING org.apache.hive.hcatalog.pig.HCatLoader();
--load empty table
finalTable = LOAD 'default.finalTable' USING org.apache.hive.hcatalog.pig.HCatLoader();
--example operations that end up with '::' in column names
g = group tableB by (id);
j = JOIN tableA by id LEFT, g by group;
result = foreach j generate tableA::id, tableA::col2, g::tableB;
--union empty finalTable and result
result2 = union finalTable, result;
--bob's your uncle
STORE result2 INTO 'finalTable' USING org.apache.hive.hcatalog.pig.HCatStorer();
Thanks to Neethu!
Upvotes: 2
Reputation: 3071
Resolved this issue , below is the fix:
Create a relation with some filter condition as below:
DUMMY_RELATION= FILTER SOURCE_TABLE BY type== '';
(I took a column named type ,this can be filtered by any column in the table , all that matters is we need its schema)
FINAL_DATASET= UNION DUMMY_RELATION,SCHEMA_1,SCHEMA_2;
(this new DUMMY_RELATION
n should be placed 1st in the union)
Now you no more have ::
operator And your column names would match hive table's column names, provided your source table (to DUMMY_RELATION) and target table have same column order.
Thanks to myself :)
Upvotes: 4