0xhacker
0xhacker

Reputation: 1119

Stream data into hdfs directly without copying

I am looking for different options through which I can write data directly into hdfs using python without storing on the local node and then using copyfromlocal.

I would like to use hdfs file similar to local file and use write method with the line as the argument, something of the following:

   hdfs_file = hdfs.create("file_tmp")
   hdfs_file.write("Hello world\n")

Does there exist something similar to the use case described above?

Upvotes: 9

Views: 6358

Answers (1)

Chris White
Chris White

Reputation: 30089

Im not sure about a python hdfs library, but you can always stream via a hadoop fs put command and denote copying from stdin using '-' as the source filename:

hadoop fs -put - /path/to/file/in/hdfs.txt

Upvotes: 14

Related Questions