ravi
ravi

Reputation: 1088

Oozie iterative workflow

I am building an application to ingest data from MYSQL DB to hive tables. App will be scheduled to execute every day.

The very first action is to read a Hive table to load import table info e.g name, type etc., and create a list of tables in a file to import. Next a Sqoop action to transfer data for each table in sequence.

Is it possible to create a shell script Oozie action which will iterate through the table list and launch oozie sub-workflow Sqoop action for each table in sequence? Could you provide some reference? Also any suggestion of a better approach!

Upvotes: 1

Views: 195

Answers (1)

ravi
ravi

Reputation: 1088

I have come up with following shell script containing Sqoop action. It works fine with some environment variable tweaking.

hdfs_path='hdfs://quickstart.cloudera:8020/user/cloudera/workflow/table_metadata'   table_temp_path='hdfs://quickstart.cloudera:8020/user/cloudera/workflow/hive_temp        

if $(hadoop fs -test -e $hdfs_path)
then
for file in $(hadoop fs -ls $hdfs_path | grep -o -e "$hdfs_path/*.*");
do
 echo ${file}
 TABLENAME=$(hadoop fs -cat ${file});
 echo $TABLENAME
 HDFSPATH=$table_temp_path
 sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/retail_db --table departments --username=retail_dba --password=cloudera --direct -m 1 --delete-target-dir --target-dir $table_temp_path/$TABLENAME
done
fi

Upvotes: 1

Related Questions