Reputation:
I have a blog that offers a REST API to download data. The API gives the list of topics (in JSON). It's possible to iterate on the list to download the messages of each topic. I want to download all messages of the forum every day and store them in HDFS.
I was thinking about writing a Java program that calls the API to get the data and store it on HDFS using Hadoop API. I can run the Java program withing a daily Oozie batch.
Is there a better way for doing this? maybe store the data on the local file system and put the file on HDFS at the end. I was wondering if Flume can be used in this case and what would be it's added value ?
Thanks in advance
Upvotes: 1
Views: 562
Reputation: 1074
This seems to be such a "simple" program. You can use any language / tool to read JSON from a rest API and then upload the content to hdfs.
And you also need a scheduler to schedule the job.
With Oozie + java/shell action/, it provides better tracking in terms of job history. I would go for this if oozie is already available.
Upvotes: 1