Reputation: 6148
I launched two m1.medium nodes on amazon ec2 for executing my pig script, but looks like it failed at the first line (even before MapReduce start): raw = LOAD 's3n://uw-cse-344-oregon.aws.amazon.com/btc-2010-chunk-000' USING TextLoader as (line:chararray);
The error message I got:
2015-02-04 02:15:39,804 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-02-04 02:15:39,821 [JobControl] INFO org.apache.hadoop.mapred.JobClient - Default number of map tasks: null
2015-02-04 02:15:39,822 [JobControl] INFO org.apache.hadoop.mapred.JobClient - Setting default number of map tasks based on cluster size to : 20
... (omitted)
2015-02-04 02:18:40,955 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2015-02-04 02:18:40,956 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201502040202_0002 has failed! Stop running all dependent jobs
2015-02-04 02:18:40,956 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2015-02-04 02:18:40,997 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Java heap space
2015-02-04 02:18:40,997 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2015-02-04 02:18:40,997 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 1.0.3 0.11.1.1-amzn hadoop 2015-02-04 02:15:32 2015-02-04 02:18:40 GROUP_BY
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201502050202_0002 ngroup,raw,triples,tt GROUP_BY,COMBINER Message: Job failed! Error - # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201502050202_0002_m_000022
Input(s):
Failed to read data from "s3n://uw-cse-344-oregon.aws.amazon.com/btc-2010-chunk-000"
Output(s):
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
I think the code should be fine since I have ever successfully loaded other data with the same syntax, and the link to s3n://uw-cse-344-oregon.aws.amazon.com/btc-2010-chunk-000
looks valid. I suspect it might be related to some of my EC2 settings, but not sure how to investigate further or narrow down the problem. Anyone has a clue?
Upvotes: 2
Views: 1029
Reputation: 6148
The problem was currently solved by changing my node from m1.medium to m3.large , thanks for the good hint from @Nat as he pointed out the error message regarding with java heap space. I'll update more details later.
Upvotes: 2
Reputation: 3717
"Java heap space" error message gives some clues. Your files seem to be quite large (~2GB). Make sure that you have enough memory for each task runner to read the data.
Upvotes: 2