Reputation: 3652
My pig script needs to pass data to the java constructor:
UPCFIND = LOAD 'testdatabase.item' USING org.apache.hive.hcatalog.pig.HCatLoader() AS (upc:chararray,description:chararray);
UPCDATA = FOREACH UPCFIND GENERATE upc,description;
DUMP UPCDATA;
//output:
(00001123456789," Table ")
(00000123456789," PICTURE ")
My UDF is:
loading = LOAD '/incoming/files/*' USING com.readingitems.loading.TheLoader(UPCDATA) as
(upc:chararray, description:chararray,
Can I pass this UPCDATA to my UDF and if so, how would I get this into a hashmap where upc is the key and description is the value. Is this considered an arraylist or tuple? Thanks in advance!
Problem right now is passing this data into the java contructor:
UPCFIND = LOAD 'testdatabase.item' USING org.apache.hive.hcatalog.pig.HCatLoader() AS (upc:chararray,description:chararray);
UPCDATA = FOREACH UPCFIND GENERATE upc,description;
UPCDATA_SCALAR = GROUP UPCDATA ALL;
loading = LOAD 'files/incoming/*' USING com.readingitems.loading.TheLoader(UPCDATA_SCALAR)
Getting the error:
ERROR org.apache.pig.Main - ERROR 2997: Encountered IOException. org.apache.pig.tools.parameters.ParameterSubstitutionException: Undefined parameter : UPCDATA_SCALAR
Dumping UPCDATA_SCALAR produces the correct results
The reason why I'm doing this is to load a hive table's data into a Loader function that's parsing files. I need to compare data in the files to the Hive table data in order to make changes and insert into a new table.
My loader function starts with:
public class TheLoader extends LoadFunc {
public TheLoader (DataBag item_master_stream) throws SQLException {
Upvotes: 0
Views: 411
Reputation: 5186
In your example UPCDATA
is a relationship. In order to pass it into a function as an argument, you are going to have to convert it into a scalar. You can accomplish this with:
UPCDATA_SCALAR = GROUP UPCDATA ALL;
In Java, this will be repersented as a DataBag
of Tuple
s. You can read more about that here.
It is worth keeping in mind that doing a GROUP ALL
is really expensive so you will want to project out all the columns that aren't critical to your UDF functioning.
Upvotes: 1