sacrac
sacrac

Reputation: 21

Reading large hdfs file from a python script

I have a python script that needs to process a large file. The code works fine if I reduce the original file and run the script but when I run the script on the original data my script takes forever to execute. I am considering using HDFS to store the file and read it from the python script. But in order to use HDFS do I have to convert my python script into a map reduce program or can I use the same code.

Upvotes: 2

Views: 1180

Answers (1)

Jakob Homan
Jakob Homan

Reputation: 2294

You'll like needly to tweak your Python code and then use Hadoop Streaming to process it. This is exactly the type of situation for which streaming was intended.

Upvotes: 3

Related Questions