JackWM
JackWM

Reputation: 10535

List all the words in a text file with occurrence counts?

Suppose I have file text.txt as below:

she likes cats, and he likes cats too.

I'd like my result to look like:

she 1
likes 2
cats 2
and 1
he 1
too 1

If putting space , . into it would make the scripts easier, that would be fine.

Is there a simple shell pipeline that could achieve this?

Upvotes: 5

Views: 7571

Answers (2)

Ed Morton
Ed Morton

Reputation: 203493

With GNU awk you can just specify the Record Separator (RS) to be any sequence of non-alphabetic characters:

$ gawk -v RS='[^[:alpha:]]+' '{sum[$0]++} END{for (word in sum) print word,sum[word]}' file
she 1
likes 2
and 1
too 1
he 1
cats 2

but that won't solve your problem of how to identify "words" in general.

Upvotes: 0

phs
phs

Reputation: 11051

Here's a one-liner near and dear to my heart:

cat text.txt | sed 's|[,.]||g' | tr ' ' '\n' | sort | uniq -c

The sed strips punctuation (tune regex to taste), the tr puts the results one word per line.

Upvotes: 20

Related Questions