Reputation: 17
I am getting the cast error after i create a join on two datasets in PIG script. The version i am using is HDP2.2 The error i am getting is :
ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 0: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.String
The error i get when i try to DUMP or store. Please advise.
My script is as follow:
complaint= load 'file1' using PigStorage('|');
extracted = foreach complaint generate $13 as complainant_first_name:chararray, $14 as complainant_last_name:chararray, $16 as hic:chararray;
filtered_com = filter extracted by hic IS NOT NULL;
mbr= load 'file2' using PigStorage(',');
extracted = foreach mbr generate $11 as first_nm:chararray, $12 as last_nm:chararray, $24 as medcr_nbr:chararray;
filtered_mbr = filter extracted by medcr_nbr is not null;
joined = join filtered_com by hic, filtered_mbr by medcr_nbr;
describe joined;
store joined into 'com_mbr' using PigStorage(',')
Upvotes: 0
Views: 1474
Reputation: 387
The error that you are witnessing is this:
*Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray incompatible with java.lang.String*
Be default when you load data into pig it is stored in ByteArray Format. So to perform any String operation you need to typecast them to chararray.
You can get the output by either using an explicit cast to chararray tpye in foreach statement or by simply leaving the data in bytearray is given below:
complaint = LOAD'sofile1.txt' USING PigStorage('|'); // This loads all the data with bytearray is default data type.
extracted = FOREACH complaint GENERATE $0 AS(complaint_first_name,$1 AS(complaint_last_name),$2 as (hic);
filtered_com = filter extracted by hic IS NOT NULL;
mbr= load 'sofile2.txt' using PigStorage(',');
extracted = FOREACH mbr GENERATE $0 AS(first_nm),$1 AS (last_nm),$2 AS (medcr_nbr);
filtered_mbr = filter extracted by medcr_nbr is not null;
joined_data = JOIN filtered_com by hic,filtered_mbr by medcr_nbr;
describe joined;
This should print the results as well. Hope this helps.
Upvotes: 0
Reputation: 1170
We can specify the load for file1 with the column data types
complaint= load 'file1' using PigStorage('|') as (col0:chararray,col1:chararray;.........)
or
We can cast the columns data types in the for each block
extracted = foreach complaint generate (chararray)$13 as complainant_first_name:chararray,
(chararray)$14 as complainant_last_name:chararray,(chararray)$16 as hic:chararray
The same can be done for file2 as well. Hope this helps!!
Upvotes: 1