user5230029
user5230029

Reputation: 1

how to load text files into hdfs through oozie workflow in a cluster

I am trying to load text/csv files in hive scripts with oozie and schedule it on daily basis. Text files are in local unix file system.

I need to put those text files into hdfs before executing the hive scripts in a oozie workflow.

In a real time cluster we don't know job will run on which node.it will run randomly in any one of the node in cluster.

can any one provide me the solution.

Thanks in advance.

Upvotes: 0

Views: 985

Answers (1)

Samson Scharfrichter
Samson Scharfrichter

Reputation: 9067

Not sure I understand what you want to do.

The way I see it, it can't work:

  • Oozie server has access to HDFS files only (same as Hive)
  • your data is on a local filesystem somewhere

So why don't you load your files into HDFS beforehand? The transfer may be triggered either when the files are available (post-processing action in the upstream job) or at fixed time (using Linux CRON).

You don't even need the Hadoop libraries on the Linux box if the WebHDFS service is active on your NameNode - just use CURL and a HTTP upload.

Upvotes: 1

Related Questions