Reputation: 132
I have a pig script that queries data from a csv file.
The script has been tested locally with small and large .csv files.
In Small Cluster: It starts with processing the scripts, and fails after completing 40% of the call
The error is just,
Failed to read data from "path to file"
What I infer is that, The script could read the file, but there is some connection drop, a message lose
But I get the above mentioned error only.
Upvotes: 1
Views: 479
Reputation: 132
An answer for the General Problem would be changing the errors levels in the Configuration Files, adding these two lines to mapred-site.xml
log4j.logger.org.apache.hadoop = error,A
log4j.logger.org.apache.pig= error,A
In my case, it aas an OutOfMemory Exception
Upvotes: 2
Reputation: 22296
Check your logs, increase the verbosity level if needed, but probably you're facing and Out of Mem error.
Check this answer on how to change Pig logging.
To change the memory in Hadoop change the hadoop-env.sh
file as you can see documented here
# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
export HADOOP_CLIENT_OPTS="-Xmx128m ${HADOOP_CLIENT_OPTS}"
For Apache PIG you have this in the header of pig bash file:
# PIG_HEAPSIZE The maximum amount of heap to use, in MB.
# Default is 1000.
So you can use export
or set it in your .bashrc
file
$ export PIG_HEAPSIZE=4096MB
Upvotes: 1