Anton Belev
Anton Belev

Reputation: 13503

Execute Pig scripts difference between -x local script.pig and just script.pig

At the moment I'm executing my scripts like that:

/usr/bin/pig /somepath/myscript.pig

and for some reason pig is always hanging at this stage.

2014-01-28 16:49:31,328 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete

if I use

`/usr/bin/pig -x local /somepath/myscript.pig`

is complaining about the paths for some reason:

Input(s):
Failed to read data from "file:///path_from_root_dir/tweets_extended_small.csv"

What's the difference and how should I specify the path in the -x local mode so that I can get rid of this error.

My tweets_extended_small.csv is in HDFS and I'm referring to it in the script like that

... LOAD 'venues_extended_small.csv' USING ...

Thanks!

Upvotes: 2

Views: 14392

Answers (3)

Chaitanya
Chaitanya

Reputation: 132

These three steps I would suggest: 1.Write your script and save it as extension. Sometimes these steps don't work out. Click on save as and declare them in quotations 2.Now give your execution with file location as pig -local \home\training\Desktop\file_name.pig 3. Understand the cliche in store command in your file.

when you are using map reduce make sure you find out the directories by listing them.

Upvotes: 0

ashubhargave
ashubhargave

Reputation: 230

Pig basically has two execution modes

1] Local Mode

2] Map-Reduce Mode

Local Mode - When you run Pig in local mode, you need access to a single machine; all files are installed and run using your local host and local file system.

Here *all files* means all the files which you are going to process and all the jars or anything which you are referring/using in pig Script.

Mapreduce Mode - When you run Pig in mapreduce mode, you are dealing with Hadoop cluster and HDFS(Hadoop Distributed File System).

In this case *all files* are expected to be in the HDFS.

So while running pig,

pig -x local script_name.pig

"-x" specifies the mode in which the script has to be ran.

So,in this case the script_name.pig needs to be under local file system.

Mapreduce mode is the default mode; While running pig script you can, but don't need to specify it using the -x flag (pig OR pig -x mapreduce).

In your case ,

Keep the file and script in your local machine. Load file as ,

...LOAD '/YOUR_PATH_TO_INPUT_CSV_FILE/venues_extended_small.csv' USING...

and then move your script to local file system and then run the script.

pig -x local '/YOUR_PATH_TO_PIG_SCRIPT/script.pig'

This link can help you in this case.

Hope this might have helped you. Thanks.

Upvotes: 8

Chaos
Chaos

Reputation: 11721

/usr/bin/pig -x local executes the pig script locally on that particular machine rather than as a distributed MapReduce job on the cluster. -x is the option to specify execution type, (options are local & mapReduce (default)).

Since your file is loaded on HDFS, it fails to recognize the path to HDFS on your local machine when you specify the local execution type.

From the details provided, I can't figure out why the command /usr/bin/pig /somepath/myscript.pig hangs. I suggest placing your csv file on the local FS and try running the script.

Upvotes: 1

Related Questions