Koushik Chandra
Koushik Chandra

Reputation: 1491

Word Count using AWK

I have file like below :

this is a sample file this file will be used for testing

this is a sample file
this file will be used for testing

I want to count the words using AWK.

the expected output is

this 2
is 1
a 1
sample 1
file 2
will 1
be 1
used 1
for 1

the below AWK I have written but getting some errors

cat anyfile.txt|awk -F" "'{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}'

Upvotes: 6

Views: 27366

Answers (3)

Chris Koknat
Chris Koknat

Reputation: 3451

Here is Perl code which provides similar sorted output to Jotne's awk solution:

perl -ne 'for (split /\s+/, $_){ $w{$_}++ }; END{ for $key (sort keys %w) { print "$key $w{$key}\n"}}' testfile

$_ is the current line, which is split based on whitespace /\s+/
Each word is then put into $_
The %w hash stores the number of occurrences of each word
After the entire file is processed, the END{} block is run
The keys of the %w hash are sorted alphabetically
Each word $key and number of occurrences $w{$key} is printed

Upvotes: 0

αғsнιη
αғsнιη

Reputation: 2771

Instead of looping each line and saving the word in array ({for(i=1;i<=NF;i++) a[$i]++}) use gawk with multi-char RS (Record Separator) definition support option and save each field in array as following(It's a little bit fast):

gawk '{a[$0]++} END{for (k in a) print k,a[k]}' RS='[[:space:]]+' file

Output:

used 1
this 2
be 1
a 1
for 1
testing 1
file 2
will 1
sample 1
is 1

In above gawk command I defines space-character-class [[:space:]]+ (including one or more spaces or \new line character) as record separator.

Upvotes: 2

Jotne
Jotne

Reputation: 41460

It works fine for me:

awk '{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}' testfile
used 1
this 2
be 1
a 1
for 1
testing 1
file 2
will 1
sample 1
is 1

PS you do not need to set -F" ", since its default any blank.
PS2, do not use cat with programs that can read data itself, like awk

You can add sort behind code to sort it.

awk '{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}' testfile | sort -k 2 -n
a 1
be 1
for 1
is 1
sample 1
testing 1
used 1
will 1
file 2
this 2

Upvotes: 12

Related Questions