Reputation: 12568
I'm getting an OutOfMemory exception from Pig when trying to execute a very simple GROUP BY on a tiny (3KB), randomly-generated, example data set.
The pig script:
$ cat example.pig
raw =
LOAD 'example-data'
USING PigStorage()
AS (thing1_id:int,
thing2_id:int,
name:chararray,
timestamp:long);
grouped =
GROUP raw BY thing1_id;
DUMP grouped;
The data:
$ cat example-data
281906 13636091 hide 1334350350
174952 20148444 save 1334427826
1082780 16033108 hide 1334500374
2932953 14682185 save 1334501648
1908385 28928536 hide 1334367665
[snip]
$ wc example-data
100 400 3239 example-data
Here we go:
$ pig -x local example.pig
[snip]
java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
[snip]
And some extra info:
$ apt-cache show hadoop | grep Version
Version: 1.0.2
$ pig --version
Apache Pig version 0.9.2 (r1232772)
compiled Jan 17 2012, 23:49:20
$ echo $PIG_HEAPSIZE
4096
At this point, I feel like I must be doing something drastically wrong because I can't see any reason why 3 kB of text would ever cause the heap to fill up.
Upvotes: 4
Views: 5862
Reputation: 169
Check this: [link] http://sumedha.blogspot.in/2012/01/solving-apache-pig-javalangoutofmemorye.html
neil, you are right, let me explain the things like this: In the bin/pig script file, the source code is :
JAVA_HEAP_MAX=-Xmx1000m
# check envvars which might override default args
if [ "$PIG_HEAPSIZE" != "" ]; then JAVA_HEAP_MAX="-Xmx""$PIG_HEAPSIZE""m" fi
It is setting the Java_heap_size to maxium ("x") using the -Xmx switch only,but i didnot know why this script overriding is not working, that is the reason, i asked you to specify directly the java heap size using the paramters as specified in the link. I didnot got time to check why this problem is raising. If any one have idea please post it here.
Upvotes: 1
Reputation: 12568
I toyed with it for a while and ended up switching from the debian packages for hadoop/pig to the raw tarballs, and the problem went away. Not sure what to make of that :)
Upvotes: 0
Reputation: 30089
You pig job is failing around the following code in MapTask.java:
931 final float recper = job.getFloat("io.sort.record.percent",(float)0.05);
932 final int sortmb = job.getInt("io.sort.mb", 100);
...
945 // buffers and accounting
946 int maxMemUsage = sortmb << 20;
947 int recordCapacity = (int)(maxMemUsage * recper);
948 recordCapacity -= recordCapacity % RECSIZE;
949 kvbuffer = new byte[maxMemUsage - recordCapacity];
So i suggest that you check what the configured value of io.sort.mb
and io.sort.record.percent
is, and whether following the above logic, maxMemUsage - recordCapacity
this is close to, or bigger than your configured JVM heap size (4096 MB)
Upvotes: 0