mnm
mnm

Reputation: 2022

Pydoop error: RuntimeError: java home not found, try setting JAVA_HOME on remote server using CDH5.4

Objective: To read a remote file stored in HDFS using pydoop, from my laptop. I'm using pycharm professional edition. I'm using Cloudera CDH5.4

pyCharm Configuration on my laptop: In the project interpreter (under settings), I have directed the python compiler to be on the remote server as ssh://remote-server-ip-address:port-number/home/ashish/anaconda/bin/python2.7

Now there is a file stored in HDFS location /home/ashish/pencil/someFileName.txt

Then I install pydoop on the remote server using pip install pydoop and its installed. Then I write this code to read the file from the hdfs location

import pydoop.hdfs as hdfs
with hdfs.open('/home/ashish/pencil/someFileName.txt') as file:

for line in file:
    print(line,'\n')

On execution I get the error

Traceback (most recent call last):
File "/home/ashish/PyCharm_proj/Remote_Server_connect/hdfsConxn.py", line 7,  in <module>
import pydoop.hdfs as hdfs
File /home/ashish/anaconda/lib/python2.7/sitepackages/pydoop/hdfs/__init__.py", line  82, in <module>
from . import common, path
File "/home/ashish/anaconda/lib/python2.7/site-packages/pydoop/hdfs/path.py", line 28, in <module>
from . import common, fs as hdfs_fs
File "/home/ashish/anaconda/lib/python2.7/site-packages/pydoop/hdfs/fs.py", line 34, in <module>
from .core import core_hdfs_fs
File "/home/ashish/anaconda/lib/python2.7/site-packages/pydoop/hdfs/core/__init__.py", line 49, in <module>
_CORE_MODULE = init(backend=HDFS_CORE_IMPL)
File "/home/ashish/anaconda/lib/python2.7/site-packages/pydoop/hdfs/core/__init__.py", line 29, in init
jvm.load_jvm_lib()
File "/home/ashish/anaconda/lib/python2.7/site- packages/pydoop/utils/jvm.py", line 33, in load_jvm_lib
java_home = get_java_home()
File "/home/ashish/anaconda/lib/python2.7/site-packages/pydoop/utils/jvm.py", line 28, in get_java_home
raise RuntimeError("java home not found, try setting JAVA_HOME")
RuntimeError: java home not found, try setting JAVA_HOME

Process finished with exit code 1

My guess is that maybe its not able to find py4j. The location of py4j is

/home/ashish/anaconda/lib/python2.7/site-packages/py4j

And when I echo java home on the remote server,

 echo $JAVA_HOME

I get this location,

/usr/java/jdk1.7.0_67-cloudera

I am new to programming in python as well as the centOS setup, please suggest what can i do to solve this problem?

Thanks

Upvotes: 0

Views: 1695

Answers (2)

mnm
mnm

Reputation: 2022

Well, looks like I solved it. What i did was I used

sys.path.append('/usr/java/jdk1.7.0_67-cloudera')

I updated the code

import os, sys
sys.path.append('/usr/java/jdk1.7.0_67-cloudera')
input_file = '/home/ashish/pencil/someData.txt'
with open(input_file) as f:
   for line in f:
       print line

This code reads a file from HDFS in a remote server and then prints the output in pycharm console on my laptop.

By using sys.path.append() you don't have to manually change the hadoop.sh file and cause conflicts with other java configuration files.

Upvotes: 1

Esteban Herrera
Esteban Herrera

Reputation: 2291

You can try by setting JAVA_HOME in hadoop-env.sh (it's commented by default).

Change:

# The java implementation to use.  Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun

To:

# The java implementation to use.  Required.
export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera

Or whatever your java installation directory is.

Upvotes: 0

Related Questions