Reputation: 65
Context:
Issue:
Things I've seen:
Exemple of oozie pipeline:
Java_Action_1 (which points to a java class that is being run)
Java_Action_2 (which points to a java class that is being run)
Java_Action_3 (which points to a java class that is being run)
Subworkflow_1 (has a fork and join step, seen it in the Oozie UI)
Java_Action_1_in_subworkflow (which points to a java class that is being run) -> job that is not writing to HDFS
Java_Action_1_in_subworkflow (which points to a java class that is being run)
Java_Action_4 (which points to a java class that is being run)
Java_Action_5 (which points to a java class that is being run)
etc.
Upvotes: 0
Views: 23
Reputation: 65
The issue was with the fs.defaultFS hadoop property. We were using viewfs and the output paths that were given to apache crunch were prefixed with viewfs:// . Because of this it was not able to write to HDFS. So we set the defaultFS to hdfs:// for the writing phase. The reading is from s3 bucket which is mounted as /folder_name on hdfs. For the reading phase the files had to be prefixed with viewfs://.
Upvotes: 0