Reputation: 1263

Executing bunch of hive queries (Building a DAG)

Background

I am looking to execute a bunch of hive queries (around about 20-30 queries, and growing in number). Some of these queries depend on the result of few others, whereas some of them can be executed in parallel. (DAG)

Question

Is there a workflow manager which can take care of building a DAG (given the bunch of queries as input) and executing these queries parallely/sequentially (in the most optimium manner).

What are the best practices for the same.

Upvotes: 0

Answers (2)

leftjoin

Reputation: 38325

Also this can be easily implemented in shell script You can start parallel processes, wait for them, then start other processes. Ampersand at the end of command instructs shell to run background process. See this example:

#!/bin/bash

LOG_DIR=/tmp/my_log_dir

#Set fail counter before parallel processes 
FAIL=0
echo "Parallel loading 1, 2 and 3..."

hive -hiveconf "some_var"="$some_value" -f myscript_1.hql 2>&1 | tee $LOG_DIR/myscript_1.log &
hive -hiveconf "some_var"="$some_value" -f myscript_2.hql 2>&1 | tee $LOG_DIR/myscript_2.log &
hive -hiveconf "some_var"="$some_value" -f myscript_3.hql 2>&1 | tee $LOG_DIR/myscript_3.log &

#Wait for three processes to finish
for job in `jobs -p`
do
echo $job
wait $job || let "FAIL+=1"
done

#Exit if some process has failed
if [ "$FAIL" != "0" ];
then
echo "Failed processes=($FAIL) Giving up..."
exit 1
fi

#Set fail counter before parallel processes 
FAIL=0
echo "Continue with next parallel steps 4,5..."
hive -hiveconf "some_var"="$some_value" -f myscript_4.hql 2>&1 | tee $LOG_DIR/myscript_4.log &
#and so on

Also there are other ways to run background processes: https://www.codeword.xyz/2015/09/02/three-ways-to-script-processes-in-parallel/

Upvotes: 1

Ashish Singh

Reputation: 533

You can use any tool for workflow management.Best practice depends on use case and expertise wise.

Traditionally in corporate :- Control-M or cron scheduler can be used.

From Big data ecosystem: oozie or azkaban

There are several other tools out there which can be used for workflow management.

Upvotes: 1

Executing bunch of hive queries (Building a DAG)

Answers (2)

Related Questions