Reputation: 73
I have the below input data in the form of JSON format.
"SeasonTicket": false,
"name": "Vinson Foreman",
"gender": "male",
"age": 50,
"email": "[email protected]",
"annualSalary": "$98501.00",
"id": 0
I need to sort the values based on the salary range ie 1000-10000,10000-25000 and so on.
Range Count
1000-10000 10
10000-50000 20
I am not using the default JSON parser or Jackson for processing the data but am parsing it as a String.I have the below map and reduce functions.
Map function
public class DemoMapper extends MapReduceBase
implements Mapper<LongWritable, Text, Text, IntWritable> {
private final IntWritable v = new IntWritable(1);
private Text k = new Text();
@Override
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
try {
if (line.contains("annualSalary")) {
String s = line.replaceAll("$", "");
String t = s.substring(26);
Double x = Double.parseDouble(t);
StringTokenizer itr = new StringTokenizer(t);
while (itr.hasMoreTokens()) {
Double x = Double.parseDouble(s.substring(26));
if (x > 1000 && x < 10000) {
k.set(itr.nextToken());
output.collect(k, v);
} else if (x > 10000 && x < 50000) {
k.set(itr.nextToken());
output.collect(k, v);
} else if (x > 50000 && x < 100000) {
k.set(itr.nextToken());
output.collect(k, v);
} else if (x > 100000) {
k.set(itr.nextToken());
output.collect(k, v);
}
output.collect(k, v);
}
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
Reduce function
public class DemoReducer extends MapReduceBase
implements Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable count = new IntWritable();
@Override
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
int sum = 0;
while (values.hasNext()) {
IntWritable value = (IntWritable) values.next();
sum += value.get();
}
count.set(sum);
output.collect(key, (IntWritable) count);
}
}
Please let me know to group this data without using a JSON parser if possible.
Upvotes: 0
Views: 1085
Reputation: 176
To group the data by ranges you can use a custom partitioner - example.
Upvotes: 1