pig beginner's example [unexpected error]

Question

I am new to Linux and Apache Pig. I am following this tutorial to learn pig: http://salsahpc.indiana.edu/ScienceCloud/pig_word_count_tutorial.htm

This is a basic word counting example. The data file 'input.txt' and the program file 'wordcount.pig' are in the Wordcount package, linked on the site.

I already have Pig 0.11.1 downloaded on my local machine, as well as Hadoop, and Java 6.

When I downloaded the Wordcount package it took me to a "tar.gz" file. I am unfamiliar with this type, and wasn't sure how to extract it. It contains the files 'input.txt','wordcount.pig' and a Readme file. I saved 'input.txt' to my Desktop. I wasn't sure where to save wordcount.pig, and decided to just type in the commands line by line in the shell.

I ran pig in local mode as follows:pig -x local

and then I just copy-pasted each line of the wordcount.pig script at the grunt> prompt like this:

A = load '/home/me/Desktop/input.txt';

B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word;

C = group B by word;

D = foreach C generate COUNT(B), group;

dump D;

This generates the following errors: ...

Retrying connect to server: localhost/127.0.0.1:8021. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

 ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2043: Unexpected error during execution.

My questions:

1. Should I be saving 'input.txt' and the original 'wordcount.pig' script to some special folder inside the directory pig-0.11.1? That is, create a folder called word inside pig-0.11.1 and put 'wordcount.pig' and 'input.txt' there and then type in "wordcount.pig" from the grunt> prompt ??? In general, if I have data in say, 'dat.txt', and a script say, 'program.pig', where should I be saving them to run 'program.pig' from the grunt shell??? I think they should both go in pig-0.11.1,so I can do $ pig -x local wordcount.pig, but I am not sure.

2. Why am I not able to run the script line by line as I tried to? I have specified the location of the file 'input.txt' in the load statement. So why does it not just run the commands line by line and dump the contents of D to my screen???

3. When I try to run Pig in mapreduce mode using $pig, it gives this error:

retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-06-03 23:57:06,956 [main] ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Failed to create DataStorage

reo katoa · Accepted Answer

This error indicates that Pig is unable to connect to Hadoop to run the job. You say you have downloaded Hadoop -- have you installed it? If you have installed it, have you started it up according to its docs -- have you run the bin/start-all.sh script? Using -x local tells Pig to use the local filesystem instead of HDFS, but it still needs a running Hadoop instance to perform the execution. Before trying to run Pig, follow the Hadoop docs to get your local "cluster" set up and make sure your NameNode, DataNodes, etc. are up and running.

pig beginner's example [unexpected error]

Answers (2)

Related Questions

pig beginner&#39;s example [unexpected error]

Answers (2)

Related Questions

pig beginner's example [unexpected error]