user2450223
user2450223

Reputation: 235

pig beginner's example [unexpected error]

I am new to Linux and Apache Pig. I am following this tutorial to learn pig: http://salsahpc.indiana.edu/ScienceCloud/pig_word_count_tutorial.htm

This is a basic word counting example. The data file 'input.txt' and the program file 'wordcount.pig' are in the Wordcount package, linked on the site.

I already have Pig 0.11.1 downloaded on my local machine, as well as Hadoop, and Java 6.

When I downloaded the Wordcount package it took me to a "tar.gz" file. I am unfamiliar with this type, and wasn't sure how to extract it. It contains the files 'input.txt','wordcount.pig' and a Readme file. I saved 'input.txt' to my Desktop. I wasn't sure where to save wordcount.pig, and decided to just type in the commands line by line in the shell.

I ran pig in local mode as follows:pig -x local

and then I just copy-pasted each line of the wordcount.pig script at the grunt> prompt like this:

A = load '/home/me/Desktop/input.txt';

B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word;

C = group B by word;

D = foreach C generate COUNT(B), group;

dump D;

This generates the following errors: ...

Retrying connect to server: localhost/127.0.0.1:8021. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

 ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2043: Unexpected error during execution.

My questions:

1. Should I be saving 'input.txt' and the original 'wordcount.pig' script to some special folder inside the directory pig-0.11.1? That is, create a folder called word inside pig-0.11.1 and put 'wordcount.pig' and 'input.txt' there and then type in "wordcount.pig" from the grunt> prompt ??? In general, if I have data in say, 'dat.txt', and a script say, 'program.pig', where should I be saving them to run 'program.pig' from the grunt shell??? I think they should both go in pig-0.11.1,so I can do $ pig -x local wordcount.pig, but I am not sure.

2. Why am I not able to run the script line by line as I tried to? I have specified the location of the file 'input.txt' in the load statement. So why does it not just run the commands line by line and dump the contents of D to my screen???

3. When I try to run Pig in mapreduce mode using $pig, it gives this error:

retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-06-03 23:57:06,956 [main] ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Failed to create DataStorage

Upvotes: 0

Views: 5359

Answers (2)

user1122711
user1122711

Reputation: 1

2043 error occurs when hadoop and pig fail to communicate with each other.

Never do a right click --> extract here, when dealing with tar.gz files.

U shud always do a tar -xzvf *.tar.gz on terminal when extracting them.

I noticed that pig doesn't get installed properly when u do a right click on pig..tar.gz file and select extract here. It's good to do a tar -xzvf pig..tar.gz from terminal.

Make sure u are running Hadoop before u execute pig -x local kind of commands.

If u want to run *.pig files from grunt> prompt, use: grunt> exec *.pig

If u want to run pig files outside grunt> prompt, use: $ pig -x local *.pig

Upvotes: 0

reo katoa
reo katoa

Reputation: 5811

This error indicates that Pig is unable to connect to Hadoop to run the job. You say you have downloaded Hadoop -- have you installed it? If you have installed it, have you started it up according to its docs -- have you run the bin/start-all.sh script? Using -x local tells Pig to use the local filesystem instead of HDFS, but it still needs a running Hadoop instance to perform the execution. Before trying to run Pig, follow the Hadoop docs to get your local "cluster" set up and make sure your NameNode, DataNodes, etc. are up and running.

Upvotes: 3

Related Questions