Reputation: 21
I have a python script that needs to process a large file. The code works fine if I reduce the original file and run the script but when I run the script on the original data my script takes forever to execute. I am considering using HDFS to store the file and read it from the python script. But in order to use HDFS do I have to convert my python script into a map reduce program or can I use the same code.
Upvotes: 2
Views: 1180
Reputation: 2294
You'll like needly to tweak your Python code and then use Hadoop Streaming to process it. This is exactly the type of situation for which streaming was intended.
Upvotes: 3