Reputation: 3433

How to give output one mapreduce job as input of another mapreduce job?

I am new to mapreduce and hadoop.I read examples and design patterns of mapreduce...

Ok we can come to the point.We are developing a software which monitors systems and capture their cpu usages regularly say, per 5 sec. and we plot usage graphs for sytems for a range of time periods say, cpu usage for last 12 hours, last week etc. We were using oracle database for this. And currently we are planning to move to hadoop.

We had discussed and proposed a mapreduce design as follows:

We should run 2 mapreduce jobs

first job:

collect persisted data for all systems and group(reduce) them by system's id say output as,

pc-1 : [ list of recorded cpu useges (in every 5 sec) ]

Then this will be given to next job

2nd job:

input is : [ list of recorded cpu useges (in every 5 sec) for a system]

then this job will group and reduce this data to an out put format like:


last 12 hrs : 20%(average)
last 24 hrs : 28%(average)
last week   : 10%(average) ....

Is it possible. Or our thought are wrong .. Please help...

Upvotes: 1

Answers (2)

griffon vulture

Reputation: 6774

One job only is necessary. Map task will output - key: system-id, value: cpu-usage, date.

Reduce task will output for each system-id, the average of each requested time range.

Map output value will be a custom class the inherits from Writeable.

You didn't provide an exact example, But Something like that:

Map:

proctected void map(LongWritable key, Text value, Context context) {
    String[] fields = value.split("\t");
    output.set(Integer.parseInt(fields[1]));
    context.write(new Text(fields[0]), new Details(fields[1],fields[2]));
}

Reduce:

DoubleWritable average = new DoubleWritable();
protected void reduce(Text key, Iterable<Details> values, Context context) {
    int[] sums = new int{0,0,0........};
    int[] counts = new int{0,0,0.........};
    int sum = 0;
    int count = 0;
    for(IntWritable value : values) {
      //for last 24 hours
        //if value is of last day........
        sums[0] += value.get();
        counts[0]++;
     //for last week
       //if value is of last week........
        sums[1] += value.get();
        counts[1]++;
     }
    //for last 24 hours
    average.set(sums[0] / (double) counts[0]);
    context.Write(key, average);
    //for last week        
    average.set(sums[1] / (double) counts[1]);
    context.Write(key, average);
    //............
}

Upvotes: 1

Thejas

Reputation: 61

Two seperate MR tasks are not necessary.

MR JOB:

MAP Phase Output - output {'system Id' , [List of CPU usage]}

Reducer Phase - Calculate the Average and other information.

If you can provide sample input data a more detailed Key Value pair description can be given.

Why don't you use System like Nagios which do these kind of monitoring work?

Upvotes: 0

How to give output one mapreduce job as input of another mapreduce job?

first job:

2nd job:

Answers (2)

Related Questions