Hadoop sample map reduce program using Awk

Question

I am familiar with Hadoop using Java. Looking for sample Hadoop Map reduce program using AWK only.

For a text file containing...

A k1
B k1
C k2
D k3

Looking for an o/p

k1 2
k2 1
k3 1

Charles Menguy · Accepted Answer

I would advise using Hadoop streaming to do this. I'm not a Awk expert by any means, but using @sudo_O answer and transforming it into the Hadoop world here is what I would do:

Write an Awk script that will be used as your mapper. You need only a mapper for this, no need for reducers.
```
$ cat mapper.awk
#!/usr/bin/awk -f

{a[$2]++}END{for(k in a)print k,a[k]}
```

You can run your Hadoop streaming job doing the following:

${HADOOP_HOME}/bin/hadoop \
jar ${HADOOP_HOME}/contrib/streaming/*.jar \
-D mapreduce.job.reduces=0 \
-D mapred.reduce.tasks=0 \
-input /path/to/input.txt \
-output /path/to/output/dir \
-mapper mapper.awk \
-file /path/to/mapper.awk

You can view the results in HDFS by doing:
```
hadoop fs -cat /path/to/output/dir/*
```

Hadoop sample map reduce program using Awk

Answers (2)

Related Questions