Hari Prasad
Hari Prasad

Reputation: 1901

API data to hadoop via Flume

I have an API which returns data in xml format.

I would like to run this on daily basis and store the returned data in Hadoop. Bit lost after going through documents of flume set up. Anyone has end to end steps for use case of pulling data from simple external API like above via flume/scheduling it using oozie?

Currently, I have created a Java program which can pull the data and place it in a file with indeed_ddmmyyyyhhmmss.xml and subsequently similar named tab delimited txt format for ease of use. I can cron it on daily basis and create external table in hive to point the location of file. Doesn't look like elegant solution for me.

Upvotes: 2

Views: 530

Answers (1)

Dmitry Zaytsev
Dmitry Zaytsev

Reputation: 182

You might use the embedded agent feature inside your java program and send the events directly to the flume instance

Upvotes: 1

Related Questions