Giovanni Botta
Giovanni Botta

Reputation: 9816

Strange cast error in Pig/Hadoop

Using Pig 0.10.1, I have the following script:

br = LOAD 'cfs:///somefile';

SPLIT br INTO s0 IF (sp == 1), not_s0 OTHERWISE;
SPLIT not_s0 INTO s1 IF (adp >= 1.0), not_s1 OTHERWISE;
SPLIT not_s1 INTO s2 IF (p > 1L), not_s2 OTHERWISE;
SPLIT not_s2 INTO s3 IF (s > 0L), s4 OTHERWISE;

tmp0 = FOREACH s0 GENERATE b, 'x' as seg;
tmp1 = FOREACH s1 GENERATE b, 'y' as seg;
tmp2 = FOREACH s2 GENERATE b, 'z' as seg;
tmp3 = FOREACH s3 GENERATE b, 'w' as seg;
tmp4 = FOREACH s4 GENERATE b, 't' as seg;

out = UNION ONSCHEMA tmp0, tmp1, tmp2, tmp3, tmp4;

dump out;

Where the file loaded in br was generated by a previous Pig script and has an embedded schema (a .pig_schema file):

describe br
br: {b: chararray,p: long,afternoon: long,ddv: long,pa: long,t0002: long,t0204: long,t0406: long,t0608: long,t0810: long,t1012: long,t1214: long,t1416: long,t1618: long,t1820: long,t2022: long,t2200: long,browser_software: chararray,first_timestamp: long,last_timestamp: long,os: chararray,platform: chararray,sp: int,adp: double}

Some irrelevant fields were edited from the above (I can't fully disclose the nature of the data at this time).

The script fails with the following error:

ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: java.lang.Integer cannot be cast to java.lang.Long

However, dumping s0, s1, s2, s3, s4 or tmp0, tmp1, tmp2 tmp3, tmp4 works flawlessly.

The Hadoop job tracker shows the following error 4 times:

java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long
at java.lang.Long.compareTo(Long.java:50)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr.doComparison(EqualToExpr.java:116)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr.getNext(EqualToExpr.java:83)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:214)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:260)

I also tried this snippet (instead of the original dump):

x = UNION s1,s2;
y = FOREACH x GENERATE b;
dump y;

and I get a different (but I assume related) error:

ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: java.lang.Double cannot be cast to java.lang.Long

with the job tracker error (repeated 4 times):

java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Long
at java.lang.Long.compareTo(Long.java:50)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GTOrEqualToExpr.doComparison(GTOrEqualToExpr.java:111)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GTOrEqualToExpr.getNext(GTOrEqualToExpr.java:78)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:141)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:260)

I tried looking for known bugs involving the union with no luck. This is really puzzling. Ideas?

Upvotes: 1

Views: 1595

Answers (2)

steve
steve

Reputation: 31

when your are performing the union operation between the two or more relations we should take care of datatypes of fields.

the above problem was raised because of the incompatible datatypes.to avoid this declare you chararray as bytearray.you will get rid of this error.

Upvotes: 0

Giovanni Botta
Giovanni Botta

Reputation: 9816

After further digging, it looks like this is a bug. I created a ticket for it.

Upvotes: 1

Related Questions