Bafla13
Bafla13

Reputation: 132

Pig: Hadoop jobs Fail

I have a pig script that queries data from a csv file.

The script has been tested locally with small and large .csv files.

In Small Cluster: It starts with processing the scripts, and fails after completing 40% of the call

The error is just, Failed to read data from "path to file"

What I infer is that, The script could read the file, but there is some connection drop, a message lose

But I get the above mentioned error only.

Upvotes: 1

Views: 479

Answers (2)

Bafla13
Bafla13

Reputation: 132

An answer for the General Problem would be changing the errors levels in the Configuration Files, adding these two lines to mapred-site.xml

log4j.logger.org.apache.hadoop = error,A 
log4j.logger.org.apache.pig= error,A

In my case, it aas an OutOfMemory Exception

Upvotes: 2

Paulo Fidalgo
Paulo Fidalgo

Reputation: 22296

Check your logs, increase the verbosity level if needed, but probably you're facing and Out of Mem error.

Check this answer on how to change Pig logging.

To change the memory in Hadoop change the hadoop-env.sh file as you can see documented here

# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
export HADOOP_CLIENT_OPTS="-Xmx128m ${HADOOP_CLIENT_OPTS}"

For Apache PIG you have this in the header of pig bash file:

# PIG_HEAPSIZE The maximum amount of heap to use, in MB.
# Default is 1000.

So you can use export or set it in your .bashrc file

$ export PIG_HEAPSIZE=4096MB

Upvotes: 1

Related Questions