Reece Markowsky
Reece Markowsky

Reputation: 123

pig SUM FOREACH GROUP ClassCastException: java.lang.String cannot be cast to java.lang.Number

i have a set of URLs and associated transaction times in hadoop. i am trying to write a pig script to give me the total transaction time for for each URL. I am getting a ClassCastException every time I try to SUM the transaction times. first time ive tried pig so any help is appreciated. i can't figure out what im doing wrong.

Here is some output: the urls and transaction times

grunt> DESCRIBE uLogUrls
uLogUrls: {url: chararray,et: int}
grunt> DUMP uLogUrls

(/index.jsp,344)
(/another/Access.jsp,517)
(/index.jsp,5)
(/another/NoAccess.jsp,4)
(/index.jsp,5)
(/index.jsp,4)

grps = GROUP uLogUrls BY url;
DUMP grps

(/index.jsp,{(/index.jsp,344),(/index.jsp,5),(/index.jsp,5),(/index.jsp,4)})
(/home/home.jsp,{(/home/home.jsp,11200)})


grunt> DESCRIBE grps
grps: {group: chararray,uLogUrls: {(url: chararray,et: int)}}

total_tx_time = foreach grps generate group as url, SUM(uLogUrls.et);

when i execute DUMP total_tx_time i get:

    28:05,370 [Thread-44] INFO  org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
    2015-09-11 19:28:05,372 [Thread-44] WARN    org.apache.hadoop.mapred.LocalJobRunner - job_local1410240575_0002
    java.lang.Exception: org.apache.pig.backend.executionengine.ExecException:     ERROR 0: Exception while executing (Name: urlgrp: Local Rearrange[tuple]    {chararray}(false) - scope-73 Operator Key: scope-73):    org.apache.pig.backend.executionengine.ExecException: ERROR 2103: Problem doing work on Doubles
        at     org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
        at    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
    Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0:     Exception while executing (Name: urlgrp: Local Rearrange[tuple]{chararray}   (false) - scope-73 Operator Key: scope-73):    org.apache.pig.backend.executionengine.ExecException: ERROR 2103: Problem doing    work on Doubles
        at     org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:316)
        at      org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.    POLocalRearrange.getNextTuple(POLocalRearrange.java:291)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:197)
    at   org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:175)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:50)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
    at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1688)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1637)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1489)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2103: Problem doing work on Doubles
    at org.apache.pig.builtin.AlgebraicDoubleMathBase.doTupleWork(AlgebraicDoubleMathBase.java:82)
    at org.apache.pig.builtin.AlgebraicDoubleMathBase$Intermediate.exec(AlgebraicDoubleMathBase.java:106)
    at org.apache.pig.builtin.AlgebraicDoubleMathBase$Intermediate.exec(AlgebraicDoubleMathBase.java:100)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:323)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextTuple(POUserFunc.java:362)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:361)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:383)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:303)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
    ... 17 more
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
    at org.apache.pig.builtin.AlgebraicDoubleMathBase.doTupleWork(AlgebraicDoubleMathBase.java:75)
    ... 25 more

any ideas what im doing wrong?

thanks!!!

Upvotes: 0

Views: 820

Answers (1)

Reece Markowsky
Reece Markowsky

Reputation: 123

The reason is that in the original FOREACH to generate uLogUrls I did not cast to double properly.

uLogUrls = FOREACH uLogs GENERATE logName as url, runTime as et:double;

The above command is what created this exception (notice there are no decimal places for each number).

DUMP uLogUrls

(/index.jsp,344)
(/secur/blah.jsp,517)
(/index.jsp,5)
(/secur/blah.jsp,4)
(/index.jsp,5)
....snip....

But when I cast it like this:

grunt> uLogUrls = FOREACH uLogs GENERATE logName as url, (double)runTime as et;
grunt> DUMP uLogUrls

(/index.jsp,344.0)
(/secur/blah.jsp,517.0)
(/index.jsp,5.0)
(/secur/blah.jsp,4.0)
(/index.jsp,5.0)
...snip....

then the GROUP and SUM functions work. Thanks for all the help!

Upvotes: 1

Related Questions