Reputation: 329
When i am trying replace legacy hdfs connector
from pyarrow import hdfs
fs = hdfs.connect()
,which works great with new fs connector
from pyarrow import fs
client = fs.HadoopFileSystem(host="default")
i am getting crash of python kernel. What am i doing wrong?
Upvotes: 3
Views: 2539
Reputation: 151
Check if you have these three parameters set in your environment. On your terminal check for this
echo $HADOOP_HOME
echo $JAVA_HOME
echo $ARROW_LIBHDFS_DIR
if not you may want to set your environment before using pyarrow. you could try this on python
import os
from pyarrow import fs
os.environ['HADOOP_HOME'] = <path to hadoop binaries>
os.environ['ARROW_LIBHDFS_DIR'] = '<path to libhdfs.so>'
fs.HadoopFileSystem("hdfs://namenode:8020?user=hdfsuser")
# fs.HadoopFileSystem("namenode") should work too
see this - How do i set the path of libhdfs.so for pyarrow?
Upvotes: 4