Makar Nikitin
Makar Nikitin

Reputation: 329

How to replace old legacy hdfs connector with new fs.HadoopFileSystem?

When i am trying replace legacy hdfs connector

from pyarrow import hdfs
fs = hdfs.connect()

,which works great with new fs connector

from pyarrow import fs
client = fs.HadoopFileSystem(host="default")

i am getting crash of python kernel. What am i doing wrong?

Upvotes: 3

Views: 2539

Answers (1)

Chess
Chess

Reputation: 151

Check if you have these three parameters set in your environment. On your terminal check for this

echo $HADOOP_HOME
echo $JAVA_HOME
echo $ARROW_LIBHDFS_DIR

if not you may want to set your environment before using pyarrow. you could try this on python

import os
from pyarrow import fs
os.environ['HADOOP_HOME'] = <path to hadoop binaries>
os.environ['ARROW_LIBHDFS_DIR'] = '<path to libhdfs.so>'

fs.HadoopFileSystem("hdfs://namenode:8020?user=hdfsuser")
# fs.HadoopFileSystem("namenode") should work too

see this - How do i set the path of libhdfs.so for pyarrow?

Upvotes: 4

Related Questions