Reputation: 137
There is Hadoop installed on a remote machine (example.host.com) . Pig is also installed on that machine.
How to access the hdfs on that machine from other machine?
I do not want to copy the files from remote. I just want to run queries on those files that are stored in avro format having schema as well.
I installed Pig on my local machine and added the following lines to pig.properties file
fs.default.name=hdfs://example.host.com:8020
mapred.job.tracker=example.host.com:8021
But when I start pig it gives the following error
2013-02-15 12:35:26,534 [main] INFO org.apache.pig.Main - Apache Pig version 0.10.1-SNAPSHOT (rexported) compiled Feb 14 2013, 17:55:12
2013-02-15 12:35:26,535 [main] INFO org.apache.pig.Main - Logging error messages to: /log/path/pig_1360911926530.log
2013-02-15 12:35:26,818 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://example.host.com:8020
2013-02-15 12:35:26.907 java[2346:1c03] Unable to load realm info from SCDynamicStore
2013-02-15 12:35:27,574 [main] ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Failed to create DataStorage
Details at logfile: /log/path/pig_1360911926530.log
And in the content of the log file "/log/path/pig_1360911926530.log" is
Error before Pig is launched
----------------------------
ERROR 2999: Unexpected internal error. Failed to create DataStorage
java.lang.RuntimeException: Failed to create DataStorage
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:205)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:118)
at org.apache.pig.impl.PigContext.connect(PigContext.java:208)
at org.apache.pig.PigServer.<init>(PigServer.java:246)
at org.apache.pig.PigServer.<init>(PigServer.java:231)
at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:47)
at org.apache.pig.Main.run(Main.java:487)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
... 14 more
================================================================================
Upvotes: 0
Views: 2066
Reputation: 627
As you can see that the exception your getting is a version mismatch. Are you sure that XXX has $HADOOP_HOME init's classpath? It really looks like it's pointing to the wrong jars.
Upvotes: 1
Reputation: 33495
Extract the Hadoop tar file on the local machine and then point the configuration files to the NameNode in the cluster and then use the hadoop fs -get
command to get the the files from the remote machine to the local machine.
Upvotes: 0