jaksky
jaksky

Reputation: 3405

Running Apache Pig tutorial problems

I am having some difficulties running "standard" pig tutorial - pig script1-hadoop.pig

However, because of cluster set up (users), I had to modify an example a bit. Standard tutorial expects all files on / of HDFS, which I cannot use in my case, so I created /pig dir for that purpose

drwxrwxrwx   - hdfs   hdfs          0 2014-03-31 11:15 /pig

with the uploaded content

-rw-r--r--   3 jakub hdfs   10408717 2014-03-31 10:41 /pig/excite.log.bz2

I also modified the pig script script1-hadoop.pig as well, to respect those changes as follows (mainly just for load and store commands):

raw = LOAD '/pig/excite.log.bz2' USING PigStorage('\t') AS (user, time, query);
...
STORE ordered_uniq_frequency INTO '/pig/script1-hadoop-results' USING PigStorage();

I run the pig script:

[jakub@hadooptools pigtmp]$ pig script1-hadoop.pig

but with no luck and getting error:

2014-03-31 10:15:11,896 [main] ERROR org.apache.pig.tools.grunt.Grunt - You don't have permission to perform the operation. Error from the server: Permission denied: user=jakub, access=WRITE, inode="/":hdfs:hdfs:drwxr-xr-x
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:214)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:158)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5202)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5184)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5158)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3405)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3375)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3349)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:724)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:502)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59598)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047)

I am not quite sure why PIG script is trying to write into / on HDFS. I know that PIG can store some immediate results on HDFS so I modified pig.temp.dir property (/etc/pig/conf/pig.properties) and created location on HDFS /pig/tmp

drwxrwxrwx   - jakub hdfs          0 2014-03-31 11:15 /pig/tmp

Any idea what might be wrong? Pig in local mode is ok.

Upvotes: 0

Views: 745

Answers (1)

jaksky
jaksky

Reputation: 3405

Sorted.

User running Pig script has to have permissions to write to tmp directory created and /user/pig_user_running has to be present on the cluster as well with permissions allowing him to write there. Super-user on HDFS is the user under which namenode process is running, which is typycally HDFS.

Upvotes: 1

Related Questions