Reputation: 107
I have a txt file with 6 columns and I'm interested in the third and fourth column, City and product, here's a sample:
2015-01-01;09:00:00;New York;shoes;214.05;Amex >
I need to get the product with max sales by City. I already have de code to agregate and count all products by city, here's the code of class mapper and class reducer:
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class ContaMaxCidadeProdutoMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static Text cidadeproduto = new Text();
private final static IntWritable numeroum = new IntWritable(1);
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String[] linha=value.toString().split(";");
cidadeproduto.set(linha[2] +" "+linha[3]);
context.write(cidadeproduto, numeroum);
}
}
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class ContaMaxCidadeProdutoReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int contValue = 0;
for (IntWritable value : values) {
contValue += value.get();
}
context.write(key, new IntWritable(contValue));
}
}
It's working correctly to get the count of each product by City, but now I need to get the product with the max count by City. I Know how to get the max count product of the whole data set but I don't know how to get it by City. I'd appreciate any tips! Thanks
Upvotes: 4
Views: 1556
Reputation: 4417
You want to get product with the max count by City. As I see it, you want for each city to have the product, with max sales in that particular city, don't you?
I'd rather do it in 2 M-R pairs. First pair is similar to yours:
public void map(Object key, Text value, Context context) {
String[] linha = value.toString().split(";");
cidadeproduto.set(linha[2] + "&" + linha[3]);
context.write(cidadeproduto, new IntWritable(1));
}
public void reduce(Text key, Iterable<IntWritable> values, Context context){
int contValue = 0;
for (IntWritable value : values) {
contValue += value.get();
}
context.write(key, new IntWritable(contValue));
}
And the second pair.
The mapper will regroup your data so that city will be a key, and product&count will be a value:
public void map(Object key, Text value, Context context) {
String[] row = value.toString().split(";");
String city = row[0].split("&")[0];
String product = row[0].split("&")[1];
String count = row[1];
context.write(new Text(city), new Text(product + "&" + count));
}
And then reduce will maintain maximum value for each city:
public void reduce(Text key, Iterable<Text> values, Context context){
int maxVal = Integer.MIN_VALUE;
String maxProd = "None";
for (IntWritable value : values) {
String ss = value.toString().split("&");
int cnt = Integer.parseInt(ss[1]);
if(cnt > maxVal){
maxVal = cnt;
maxProd = ss[0];
}
}
context.write(key, new Text(maxProd));
}
Upvotes: 2
Reputation: 15684
I shall start by explaining the fundamentals of map/reduce, of which there are two basic parts:
In your current application, you have selected the number 1, regardless of what the input looks like. Summing a bunch of 1s has the same effect as counting them.
Instead, you will want to map it to another value, by extracting it from your input string and parsing it into a Double
, and sending that in place of numeroum
.
Upvotes: 0