Reputation: 13
I'm using Hadoop to analyze GSOD data ( I chose 5 years to executed my experiments (2005 - 2009). I've configured a little cluster and executed a simple MapReduce program that gets the maximum temperature registered for a year.
Now I have to create a new MR program that counts for each station all the phenomena occurences all those years.
The files that I have to analyze have this structure:
722115 110001
722115 011001
722110 111000
722110 001000
722000 001000
The column STN means the station code and FRSHTT means the phenomena: F - Fog, R - Rain or drizzle, S - Snow or ice pellets, H - Hail, T - Thunder, O - Tornado or funnel cloud.
The value 1, means that this phenomenun occured at that day; 0, means not ocurred.
I need to find results like following:
722115: F = 1, R = 2, S = 1, O = 2
722110: F = 1, R = 1, S = 2
722000: S = 1
I could run the MR program but the results are wrong, giving me these results:
722115 F, 1
722115 R, 1
722115 R, 1
722115 S, 1
722115 O, 1
722115 O, 1
722110 F, 1
722110 R, 1
722110 S, 1
722110 S, 1
722000 S, 1
I have used these codes:
public class Mapper extends org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, StationPhenomenun, IntWritable> {
protected void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException {
String line = value.toString();
// Every file starts with a field description line, so, I ignore this line
if (!line.startsWith("STN---")) {
// First field of the line means the station code where data was collected
String station = line.substring(0, 6);
String fog = (line.substring(132, 133));
String rainOrDrizzle = (line.substring(133, 134));
String snowOrIcePellets = (line.substring(134, 135));
String hail = (line.substring(135, 136));
String thunder = (line.substring(136, 137));
String tornadoOrFunnelCloud = (line.substring(137, 138));
if (fog.equals("1"))
context.write(new StationPhenomenun(station,"F"), new IntWritable(1));
if (rainOrDrizzle.equals("1"))
context.write(new StationPhenomenun(station,"R"), new IntWritable(1));
if (snowOrIcePellets.equals("1"))
context.write(new StationPhenomenun(station,"S"), new IntWritable(1));
if (hail.equals("1"))
context.write(new StationPhenomenun(station,"H"), new IntWritable(1));
if (thunder.equals("1"))
context.write(new StationPhenomenun(station,"T"), new IntWritable(1));
if (tornadoOrFunnelCloud.equals("1"))
context.write(new StationPhenomenun(station,"O"), new IntWritable(1));
public class Reducer extends org.apache.hadoop.mapreduce.Reducer<StationPhenomenun, IntWritable, StationPhenomenun, IntWritable> {
protected void reduce(StationPhenomenun key, Iterable<IntWritable> values, org.apache.hadoop.mapreduce.Reducer.Context context) throws IOException, InterruptedException {
int count = 0;
for (IntWritable value : values) {
String station = key.getStation().toString();
String occurence = key.getPhenomenun().toString();
StationPhenomenun textPair = new StationPhenomenun(station, occurence);
context.write(textPair, new IntWritable(count));
public class StationPhenomenun implements WritableComparable<StationPhenomenun> {
private String station;
private String phenomenun;
public StationPhenomenun(String station, String phenomenun) {
this.station = station;
this.phenomenun = phenomenun;
public StationPhenomenun() {
public String getStation() {
return station;
public String getPhenomenun() {
return phenomenun;
public void readFields(DataInput in) throws IOException {
station = in.readUTF();
phenomenun = in.readUTF();
public void write(DataOutput out) throws IOException {
public int compareTo(StationPhenomenun t) {
int cmp = this.station.compareTo(t.station);
if (cmp != 0) {
return cmp;
return this.phenomenun.compareTo(t.phenomenun);
public boolean equals(Object obj) {
if (obj == null) {
return false;
if (getClass() != obj.getClass()) {
return false;
final StationPhenomenun other = (StationPhenomenun) obj;
if (this.station != other.station && (this.station == null || !this.station.equals(other.station))) {
return false;
if (this.phenomenun != other.phenomenun && (this.phenomenun == null || !this.phenomenun.equals(other.phenomenun))) {
return false;
return true;
public int hashCode() {
return this.station.hashCode() * 163 + this.phenomenun.hashCode();
public class NcdcJob {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf);
FileInputFormat.addInputPath(job, new Path("/user/hadoop/input"));
FileOutputFormat.setOutputPath(job, new Path("/user/hadoop/station"));
System.exit(job.waitForCompletion(true) ? 0 : 1);
Has anyone done something similar?
PS.: I have tried this solution (Hadoop - composite key) but does not worked for me.
Upvotes: 1
Views: 3217
Reputation: 2225
Just check if the following 2 classes matches towards your custom implementation.
I was able to get the desired result with the following changes
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
protected void reduce(StationPhenomenun key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
Also changed the Class Names to MyMapper
and MyReducer
For this input set, I could get the following result
StationPhenomenun [station=722000, phenomenun=S] 1
StationPhenomenun [station=722110, phenomenun=F] 1
StationPhenomenun [station=722110, phenomenun=R] 1
StationPhenomenun [station=722110, phenomenun=S] 2
StationPhenomenun [station=722115, phenomenun=F] 1
StationPhenomenun [station=722115, phenomenun=O] 2
StationPhenomenun [station=722115, phenomenun=R] 2
StationPhenomenun [station=722115, phenomenun=S] 1
Computation is same, you just need to customize how the output is displayed.
Upvotes: 1