Robert Almeida
Robert Almeida

Reputation: 107

How to get max count over two keys in java mapreduce hadoop

I have a txt file with 6 columns and I'm interested in the third and fourth column, City and product, here's a sample:

2015-01-01;09:00:00;New York;shoes;214.05;Amex >

I need to get the product with max sales by City. I already have de code to agregate and count all products by city, here's the code of class mapper and class reducer:

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class ContaMaxCidadeProdutoMapper extends Mapper<Object, Text, Text, IntWritable> {

	private final static Text cidadeproduto = new Text();
	private final static IntWritable numeroum = new IntWritable(1);

	public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
		
		String[] linha=value.toString().split(";");		
		cidadeproduto.set(linha[2] +" "+linha[3]);
		context.write(cidadeproduto, numeroum);		
	}
}

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class ContaMaxCidadeProdutoReducer extends	Reducer<Text, IntWritable, Text, IntWritable> {
	
	public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
		int contValue = 0;
		
		for (IntWritable value : values) {
			contValue += value.get();
		}
		
		context.write(key, new IntWritable(contValue));
	}
}

It's working correctly to get the count of each product by City, but now I need to get the product with the max count by City. I Know how to get the max count product of the whole data set but I don't know how to get it by City. I'd appreciate any tips! Thanks

Upvotes: 4

Views: 1556

Answers (2)

Viacheslav Shalamov
Viacheslav Shalamov

Reputation: 4417

You want to get product with the max count by City. As I see it, you want for each city to have the product, with max sales in that particular city, don't you?

I'd rather do it in 2 M-R pairs. First pair is similar to yours:

public void map(Object key, Text value, Context context) {
    String[] linha = value.toString().split(";");       
    cidadeproduto.set(linha[2] + "&" + linha[3]);
    context.write(cidadeproduto, new IntWritable(1));       
}

public void reduce(Text key, Iterable<IntWritable> values, Context context){
    int contValue = 0;

    for (IntWritable value : values) {
        contValue += value.get();
    }
    context.write(key, new IntWritable(contValue));
}

And the second pair.
The mapper will regroup your data so that city will be a key, and product&count will be a value:

public void map(Object key, Text value, Context context) {
    String[] row = value.toString().split(";");
    String city = row[0].split("&")[0];
    String product = row[0].split("&")[1];
    String count = row[1];
    context.write(new Text(city), new Text(product + "&" + count));     
}

And then reduce will maintain maximum value for each city:

public void reduce(Text key, Iterable<Text> values, Context context){
    int maxVal = Integer.MIN_VALUE;
    String maxProd = "None";

    for (IntWritable value : values) {
        String ss = value.toString().split("&");
        int cnt = Integer.parseInt(ss[1]);
        if(cnt > maxVal){
            maxVal = cnt;
            maxProd = ss[0];
        }
    }
    context.write(key, new Text(maxProd));
}

Upvotes: 2

Joe C
Joe C

Reputation: 15684

I shall start by explaining the fundamentals of map/reduce, of which there are two basic parts:

  • Map: Convert your raw input into a value you can work with (in your case, a city/product pair and a number)
  • Reduce: For each city/product pair, sum all of the numbers.

In your current application, you have selected the number 1, regardless of what the input looks like. Summing a bunch of 1s has the same effect as counting them.

Instead, you will want to map it to another value, by extracting it from your input string and parsing it into a Double, and sending that in place of numeroum.

Upvotes: 0

Related Questions