WilD
WilD

Reputation: 57

Load and process data in parallel inside Hadoop

i am using hadoop to process bigdata, i first load data to hdfs and then execute jobs, but it is sequential. Is it possible to do it in parallel. For example, running 3 jobs and 2 process of load data from others jobs at same time on my cluster.

Cheers

Upvotes: 0

Views: 178

Answers (2)

anand
anand

Reputation: 326

It is possible to run the all job's in parallel in hadoop if your cluster and jobs satisfies the below criteria:

1) Hadoop Cluster should have capability to run reasonable number of map/reduce task(depends on jobs) in parallel(i.e. should have enough map/reduce slots).

2) If jobs that is currently being run , depends on the data which is loaded through another process, we cannot run data load and job in parallel.

If you process satisfies the above condition, you can all the jobs in parallel.

Using Oozie you can schedule all the process to run in parallel. Fork and Join properties in Oozie allows you to accomplish the task to run in parallel.

Upvotes: 1

RojoSam
RojoSam

Reputation: 1496

If your cluster has enough resources to run the jobs in parallel, then yes. But be sure that the work of each job, doesn't interfere with the others. Like load the data at the same time that another job in execution should be using it, that won't work as you expected.

If there is not enough resources, then hadoop will enqueue the jobs until the resources are available, depending on the Scheduler configured.

Upvotes: 0

Related Questions