Blue Diamond
Blue Diamond

Reputation: 3069

Hadoop sample map reduce program using Awk

I am familiar with Hadoop using Java. Looking for sample Hadoop Map reduce program using AWK only.

For a text file containing...

A k1
B k1
C k2
D k3

Looking for an o/p

k1 2
k2 1
k3 1

Upvotes: 1

Views: 3113

Answers (2)

Charles Menguy
Charles Menguy

Reputation: 41428

I would advise using Hadoop streaming to do this. I'm not a Awk expert by any means, but using @sudo_O answer and transforming it into the Hadoop world here is what I would do:

  1. Write an Awk script that will be used as your mapper. You need only a mapper for this, no need for reducers.

    $ cat mapper.awk
    #!/usr/bin/awk -f
    
    {a[$2]++}END{for(k in a)print k,a[k]}
    
  2. You can run your Hadoop streaming job doing the following:

    ${HADOOP_HOME}/bin/hadoop \
    jar ${HADOOP_HOME}/contrib/streaming/*.jar \
    -D mapreduce.job.reduces=0 \
    -D mapred.reduce.tasks=0 \
    -input /path/to/input.txt \
    -output /path/to/output/dir \
    -mapper mapper.awk \
    -file /path/to/mapper.awk
    
  3. You can view the results in HDFS by doing:

    hadoop fs -cat /path/to/output/dir/*
    

Upvotes: 4

Chris Seymour
Chris Seymour

Reputation: 85785

This will do the trick:

$ awk '{a[$2]++}END{for(k in a)print k,a[k]}' file
k1 2
k2 1
k3 1

Upvotes: 4

Related Questions