Reputation: 1811
My context is ;
10 csv files are uploaded to my server during the night .
My process is :
Ingestion :
Processing :
I am searching best practices to automate the first part and trigger the second part .
I also see https://kylo.io/ , It's perfect but i think still young to put it in production.
Thanks in advance .
Upvotes: 0
Views: 601
Reputation: 492
Oozie and Nifi both will work in combination with flume, hive and spark actions.
So your (Oozie or Nifi) workflow should work like this
A cron job (or time schedule) initiates workflow.
First step in work flow is Flume process to load data in desired HDFS directories. You can do this without Flume with just HDFS command but this will help maintain your solution scalable for future.
A hive action to create/update table
Spark actions to execute your custom spark programs
Make sure you take care of error handling in the workflow with proper logging and notifications so that you can oprationalize the workflow in production.
Upvotes: 2