Reputation: 3069
I am familiar with Hadoop using Java. Looking for sample Hadoop Map reduce program using AWK only.
For a text file containing...
A k1
B k1
C k2
D k3
Looking for an o/p
k1 2
k2 1
k3 1
Upvotes: 1
Views: 3113
Reputation: 41428
I would advise using Hadoop streaming to do this. I'm not a Awk expert by any means, but using @sudo_O answer and transforming it into the Hadoop world here is what I would do:
Write an Awk script that will be used as your mapper. You need only a mapper for this, no need for reducers.
$ cat mapper.awk
#!/usr/bin/awk -f
{a[$2]++}END{for(k in a)print k,a[k]}
You can run your Hadoop streaming job doing the following:
${HADOOP_HOME}/bin/hadoop \
jar ${HADOOP_HOME}/contrib/streaming/*.jar \
-D mapreduce.job.reduces=0 \
-D mapred.reduce.tasks=0 \
-input /path/to/input.txt \
-output /path/to/output/dir \
-mapper mapper.awk \
-file /path/to/mapper.awk
You can view the results in HDFS by doing:
hadoop fs -cat /path/to/output/dir/*
Upvotes: 4
Reputation: 85785
This will do the trick:
$ awk '{a[$2]++}END{for(k in a)print k,a[k]}' file
k1 2
k2 1
k3 1
Upvotes: 4