Neethu Lalitha
Neethu Lalitha

Reputation: 3071

PIG: How to remove '::' in the column name

I have a pig relation like below:

FINAL= {input_md5::type: chararray,input_md5::name: chararray,input_md5::id: long,input_md5::age: chararray,test_1:: type: chararray,test_2::name:chararray}

I am trying to store all columns for input_md5 relation to a hive table. like all input_md5::type: chararray,input_md5::name: chararray,input_md5::id: long,input_md5::age: chararray not taking test_1:: type: chararray,test_2::name:chararray

is there any command in pig which filters only columns of input_md5.Something like below:

STORE= FOREACH FINAL GENERATE all input_md5::type . I know that pig have :

FOREACH FINAL GENERATE all input_md5::type as type syntax, but i have many columns so I cannot use as in my code.

Because when i try: STORE= FOREACH FINAL GENERATE input_md5::type .. bus_input_md5::name;

Pig throws an error:

org.apache.hive.hcatalog.common.HCatException : 2007 : Invalid column position in partition schema : Expected column <type> at position 1, found column <input_md5::type>

Thanks in advance,

Upvotes: 1

Views: 1238

Answers (2)

user584583
user584583

Reputation: 1280

I implemented Neethu's example this way. May have typos, but it shows how to implement this idea.

tableA = LOAD 'default.tableA' USING org.apache.hive.hcatalog.pig.HCatLoader();
tableB = LOAD 'default.tableB' USING org.apache.hive.hcatalog.pig.HCatLoader();

--load empty table
finalTable = LOAD 'default.finalTable' USING org.apache.hive.hcatalog.pig.HCatLoader();

--example operations that end up with '::' in column names
g = group tableB by (id);
j = JOIN tableA by id LEFT, g by group;
result = foreach j generate tableA::id, tableA::col2, g::tableB;

--union empty finalTable and result
result2 = union finalTable, result;

--bob's your uncle
STORE result2 INTO 'finalTable' USING org.apache.hive.hcatalog.pig.HCatStorer();

Thanks to Neethu!

Upvotes: 2

Neethu Lalitha
Neethu Lalitha

Reputation: 3071

Resolved this issue , below is the fix:

Create a relation with some filter condition as below:

DUMMY_RELATION= FILTER SOURCE_TABLE BY type== ''; (I took a column named type ,this can be filtered by any column in the table , all that matters is we need its schema)

FINAL_DATASET= UNION DUMMY_RELATION,SCHEMA_1,SCHEMA_2;

(this new DUMMY_RELATIONn should be placed 1st in the union) Now you no more have :: operator And your column names would match hive table's column names, provided your source table (to DUMMY_RELATION) and target table have same column order.

Thanks to myself :)

Upvotes: 4

Related Questions