Shai Harel
Shai Harel

Reputation: 31

Hadoop pig latin unable to stream through a python script

I have a simple python script (moo.py) that i am trying to stream though

import sys, os
for line in sys.stdin:
    print 1;

and i try to run this pig script

DEFINE CMD `python moo.py` ship('moo.py');
data = LOAD 's3://path/to/my/data/*' AS (a:chararray, b:chararray, c:int, d:int);
res = STREAM data through CMD;
dump res;

when i run this pig script local (pig -x local) everything is fine, but when i run it without -x local, it prints out this error

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error creating job configuration.

[Log file]

Caused by: java.io.FileNotFoundException: File moo.py does not exist.

any idea?

Upvotes: 3

Views: 1119

Answers (2)

Shai Harel
Shai Harel

Reputation: 31

The problem was that i used ship() function instead of cache() while ship() works file - passing local files from the master to the slaves cache() is used by the slaves to obtain files from an accessible place such as s3 on amazon

hope that helps anyone :]

Upvotes: 0

frail
frail

Reputation: 4118

it's most likely an issue of relative path.

try:

DEFINE CMD `python moo.py` ship('/local/path/to/moo.py');

it can also be an issue of read/write/execute permission.

Upvotes: 5

Related Questions