Praveen Gr
Praveen Gr

Reputation: 197

How to integrate NLTK with Hadoop HDFS?

I have a working sentiment analysis program using NLTK which reads the text from a .txt file placed in my local machine. Now i would like to read txt file placed in Hadoop HDFS and perform same sentiment analysis.

How can i achieve this ?

Any pointers on this topic would be greatly appreciated !!!

Upvotes: 0

Views: 598

Answers (1)

greedybuddha
greedybuddha

Reputation: 7507

So this won't be completely possible unless nltk can recognize HDFS. But most programs like NLTK will allow you to pass data directly into the program. Assuming this is the case you can use what I suggest in this other answer, How to run external program within mapper or reducer giving HDFS files as input and storing output files in HDFS?. You essentially write a small java adapter that opens the input stream of the HDFS file and passes it to the program you want to run.

If that sounds like too much trouble, or just isn't possible for some reason in your case, then you can always just use HDFS get to place the file into a local address.

Upvotes: 0

Related Questions