user3508520
user3508520

Reputation:

Polling data from REST API to HDFS

I have a blog that offers a REST API to download data. The API gives the list of topics (in JSON). It's possible to iterate on the list to download the messages of each topic. I want to download all messages of the forum every day and store them in HDFS.

I was thinking about writing a Java program that calls the API to get the data and store it on HDFS using Hadoop API. I can run the Java program withing a daily Oozie batch.

Is there a better way for doing this? maybe store the data on the local file system and put the file on HDFS at the end. I was wondering if Flume can be used in this case and what would be it's added value ?

Thanks in advance

Upvotes: 1

Views: 562

Answers (1)

Paul H.
Paul H.

Reputation: 1074

This seems to be such a "simple" program. You can use any language / tool to read JSON from a rest API and then upload the content to hdfs.

And you also need a scheduler to schedule the job.

With Oozie + java/shell action/, it provides better tracking in terms of job history. I would go for this if oozie is already available.

Upvotes: 1

Related Questions