Best practices when running Hadoop MapReduce jobs/Hive scripts/Pig scripts etc

Question

I am interested in understanding how ETL jobs like Hadoop MapReduce jobs/Spark Jobs/Hive scripts/Pig scripts are usually deployed in an on premises production/development environment.

Are they always deployed and run using an orchestrator like Apache Airflow or Apache Oozie?

I'm assuming these jobs are almost never run standalone and are always run using a scheduler even if it is a simple scheduled bash script. Is this accurate?

Would also be extremely helpful if I would be able to get some reading material on this topic as well.

Best practices when running Hadoop MapReduce jobs/Hive scripts/Pig scripts etc

Answers (1)

Related Questions