Nabil
Nabil

Reputation: 1811

Hadoop Ingestion automation techniques

My context is ;

10 csv files are uploaded to my server during the night .

My process is :

I am searching best practices to automate the first part and trigger the second part .

I also see https://kylo.io/ , It's perfect but i think still young to put it in production.

Thanks in advance .

Upvotes: 0

Views: 601

Answers (1)

alpeshpandya
alpeshpandya

Reputation: 492

Oozie and Nifi both will work in combination with flume, hive and spark actions.

So your (Oozie or Nifi) workflow should work like this

  1. A cron job (or time schedule) initiates workflow.

  2. First step in work flow is Flume process to load data in desired HDFS directories. You can do this without Flume with just HDFS command but this will help maintain your solution scalable for future.

  3. A hive action to create/update table

  4. Spark actions to execute your custom spark programs

Make sure you take care of error handling in the workflow with proper logging and notifications so that you can oprationalize the workflow in production.

Upvotes: 2

Related Questions