Reputation: 13503
At the moment I'm executing my scripts like that:
/usr/bin/pig /somepath/myscript.pig
and for some reason pig is always hanging at this stage.
2014-01-28 16:49:31,328 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
if I use
`/usr/bin/pig -x local /somepath/myscript.pig`
is complaining about the paths for some reason:
Input(s):
Failed to read data from "file:///path_from_root_dir/tweets_extended_small.csv"
What's the difference and how should I specify the path in the -x local
mode
so that I can get rid of this error.
My tweets_extended_small.csv
is in HDFS
and I'm referring to it in the script like that
... LOAD 'venues_extended_small.csv' USING ...
Thanks!
Upvotes: 2
Views: 14392
Reputation: 132
These three steps I would suggest:
1.Write your script and save it as extension. Sometimes these steps don't work out. Click on save as and declare them in quotations
2.Now give your execution with file location as pig -local \home\training\Desktop\file_name.pig
3. Understand the cliche in store command in your file.
when you are using map reduce make sure you find out the directories by listing them.
Upvotes: 0
Reputation: 230
Pig basically has two execution modes
1] Local Mode
2] Map-Reduce Mode
Local Mode - When you run Pig in local mode, you need access to a single machine; all files are installed and run using your local host and local file system.
Here *all files* means all the files which you are going to process and all the jars or anything which you are referring/using in pig Script.
Mapreduce Mode - When you run Pig in mapreduce mode, you are dealing with Hadoop cluster and HDFS(Hadoop Distributed File System).
In this case *all files* are expected to be in the HDFS.
So while running pig,
pig -x local script_name.pig
"-x" specifies the mode in which the script has to be ran.
So,in this case the script_name.pig needs to be under local file system.
Mapreduce mode is the default mode; While running pig script you can, but don't need to specify it using the -x flag (pig OR pig -x mapreduce).
In your case ,
Keep the file and script in your local machine. Load file as ,
...LOAD '/YOUR_PATH_TO_INPUT_CSV_FILE/venues_extended_small.csv' USING...
and then move your script to local file system and then run the script.
pig -x local '/YOUR_PATH_TO_PIG_SCRIPT/script.pig'
This link can help you in this case.
Hope this might have helped you. Thanks.
Upvotes: 8
Reputation: 11721
/usr/bin/pig -x local
executes the pig script locally on that particular machine rather than as a distributed MapReduce job on the cluster. -x
is the option to specify execution type, (options are local & mapReduce (default)).
Since your file is loaded on HDFS, it fails to recognize the path to HDFS on your local machine when you specify the local execution type.
From the details provided, I can't figure out why the command /usr/bin/pig /somepath/myscript.pig
hangs. I suggest placing your csv file on the local FS and try running the script.
Upvotes: 1