Reputation: 123
i have a set of URLs and associated transaction times in hadoop. i am trying to write a pig script to give me the total transaction time for for each URL. I am getting a ClassCastException every time I try to SUM the transaction times. first time ive tried pig so any help is appreciated. i can't figure out what im doing wrong.
Here is some output: the urls and transaction times
grunt> DESCRIBE uLogUrls
uLogUrls: {url: chararray,et: int}
grunt> DUMP uLogUrls
(/index.jsp,344)
(/another/Access.jsp,517)
(/index.jsp,5)
(/another/NoAccess.jsp,4)
(/index.jsp,5)
(/index.jsp,4)
grps = GROUP uLogUrls BY url;
DUMP grps
(/index.jsp,{(/index.jsp,344),(/index.jsp,5),(/index.jsp,5),(/index.jsp,4)})
(/home/home.jsp,{(/home/home.jsp,11200)})
grunt> DESCRIBE grps
grps: {group: chararray,uLogUrls: {(url: chararray,et: int)}}
total_tx_time = foreach grps generate group as url, SUM(uLogUrls.et);
when i execute DUMP total_tx_time i get:
28:05,370 [Thread-44] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2015-09-11 19:28:05,372 [Thread-44] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local1410240575_0002
java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: urlgrp: Local Rearrange[tuple] {chararray}(false) - scope-73 Operator Key: scope-73): org.apache.pig.backend.executionengine.ExecException: ERROR 2103: Problem doing work on Doubles
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: urlgrp: Local Rearrange[tuple]{chararray} (false) - scope-73 Operator Key: scope-73): org.apache.pig.backend.executionengine.ExecException: ERROR 2103: Problem doing work on Doubles
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:316)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators. POLocalRearrange.getNextTuple(POLocalRearrange.java:291)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:197)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:175)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:50)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1688)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1637)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1489)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2103: Problem doing work on Doubles
at org.apache.pig.builtin.AlgebraicDoubleMathBase.doTupleWork(AlgebraicDoubleMathBase.java:82)
at org.apache.pig.builtin.AlgebraicDoubleMathBase$Intermediate.exec(AlgebraicDoubleMathBase.java:106)
at org.apache.pig.builtin.AlgebraicDoubleMathBase$Intermediate.exec(AlgebraicDoubleMathBase.java:100)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:323)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextTuple(POUserFunc.java:362)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:361)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:383)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:303)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
... 17 more
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
at org.apache.pig.builtin.AlgebraicDoubleMathBase.doTupleWork(AlgebraicDoubleMathBase.java:75)
... 25 more
any ideas what im doing wrong?
thanks!!!
Upvotes: 0
Views: 820
Reputation: 123
The reason is that in the original FOREACH
to generate uLogUrls I did not cast to double properly.
uLogUrls = FOREACH uLogs GENERATE logName as url, runTime as et:double;
The above command is what created this exception (notice there are no decimal places for each number).
DUMP uLogUrls
(/index.jsp,344)
(/secur/blah.jsp,517)
(/index.jsp,5)
(/secur/blah.jsp,4)
(/index.jsp,5)
....snip....
But when I cast it like this:
grunt> uLogUrls = FOREACH uLogs GENERATE logName as url, (double)runTime as et;
grunt> DUMP uLogUrls
(/index.jsp,344.0)
(/secur/blah.jsp,517.0)
(/index.jsp,5.0)
(/secur/blah.jsp,4.0)
(/index.jsp,5.0)
...snip....
then the GROUP
and SUM
functions work. Thanks for all the help!
Upvotes: 1