Mihir
Mihir

Reputation: 603

Python and Hadoop - fetch and write data directly to hdfs using python?

I want to fetch the data daily from yahoo/google finance, related to stock's eod prices. These prices should be directly stored in HDFS in file.

I can later make external table on top of it (using HIVE) and use for further analysis.

So, I am not looking for basic map-reduce, since I don't have any input file as such. Are there any connectors available in python, which can write data in Hadoop?

Upvotes: 1

Views: 2894

Answers (1)

Samson Scharfrichter
Samson Scharfrichter

Reputation: 9067

Start with dumping your data in a local file. Then find a way to upload the file to HDFS.

  • If you are running your job on an "edge node" (i.e. a Linux box that is not part of the cluster but has all the Hadoop clients installed and configured), then you have the good old HDFS command-line interface

hdfs dfs -put data.txt /user/johndoe/some/hdfs/dir/

  • If you are running your job anywhere else, use an HTTP library (or good old curl command line) to connect to the HDFS REST service -- could be either webHDFS or httpFS depending on the way the cluster has been set up -- and upload the file with a PUT request

http://namenode:port/webhdfs/v1/user/johndoe/some/hdfs/dir/data.txt?op=CREATE&overwrite=false

(and the content of "data.txt" as payload, of course)

Upvotes: 1

Related Questions