Reputation: 9818
I'm new to Hadoop and was curious about the commandline messages from my pig script.
Total records written : 7676
Total bytes written : 341396
Spillable Memory Manager spill count : 103
Total bags proactively spilled: 39
Total records proactively spilled: 32389322
The end result is indicated to be a "Success!". I am still not sure. What do these numbers above mean?
Thanks.
Upvotes: 4
Views: 1968
Reputation: 10650
The first two shows the total records/bytes written to HDFS by your MR job.
It can happen, that during a MR job not all records fit into the memory.
Spill counters indicate how many records have been written to the local disks of your datanodes to avoid running out of memory.
Pig uses two methods to control the memory usage and do a spill if necessary:
This is like a central place where the spillable bags are registered. In case of low memory this manager
goes through the list of the registered bags and performs a GC
.
2.
Proactive (self) spilling:
Bags can also spill themselves if their memory limit is reached (see pig.cachedbag.memusage
)
Back to the statistics you have:
It's always good to check the spill stats of your job since lot of spilling may indicate huge performance hit that need to be avoided.
Upvotes: 5