Reputation: 321
I am trying to read a file on HDFS with Python using the hdfs3 module.
import hdfs3
hdfs = hdfs3.HDFileSystem(host='xxx.xxx.com', port=12345)
hdfs.ls('/projects/samplecsv/part-r-00000')
This produces
[{'block_size': 134345348,
'group': 'supergroup',
'kind': 'file',
'last_access': 1473453452,
'last_mod': 1473454723,
'name': '/projects/samplecsv/part-r-00000/',
'owner': 'dr',
'permissions': 420,
'replication': 3,
'size': 98765631}]
So it seems to be able to access the HDFS and read the directory structure. However, reading the file fails.
with hdfs.open('/projects/samplecsv/part-r-00000', 'rb') as f:
print(f.read(100))
gives
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-94-46f0db8e87dd> in <module>()
1 with hdfs.open('/projects/samplecsv/part-r-00000', 'rb') as f:
----> 2 print(f.read(100))
/anaconda3/lib/python3.5/site-packages/hdfs3/core.py in read(self, length)
615 length -= ret
616 else:
--> 617 raise IOError('Read file %s Failed:' % self.path, -ret)
618
619 return b''.join(buffers)
OSError: [Errno Read file /projects/samplecsv/part-r-00000 Failed:] 1
What could be the issue? I am using Python3.5.
Upvotes: 2
Views: 2255
Reputation: 670
In case, If you want to read multiple files from hdfs direcotory you can try below example:
import hdfs3
hdfs = hdfs3.HDFileSystem(host='xxx.xxx.com', port=12345)
hdfs.ls('/projects/samplecsv/part-r-00000')
#you have to add file to location if its not present.
hdfs.put('local-file.txt', '/projects/samplecsv/part-r-00000')
file_loc = '/projects/samplecsv/part-r-00000'
for file in hdfs.glob(os.path.join(file_loc , '*.txt')):
with hdfs.open(file) as f:
print(f.read(100))
Upvotes: 2
Reputation: 208
if You want any operation on files then you have to pass full File path .
import hdfs3
hdfs = hdfs3.HDFileSystem(host='xxx.xxx.com', port=12345)
hdfs.ls('/projects/samplecsv/part-r-00000')
#you have to add file to location
hdfs.put('local-file.txt', '/projects/samplecsv/part-r-00000')
with hdfs.open('projects/samplecsv/part-r-00000/local-file.txt', 'rb') as f:
print(f.read(100))
Upvotes: 3