user3639557
user3639557

Reputation: 5291

listing words that follow a word along with their frequency in a text file

Is there a clean way to get a list of words that follow a particular pattern in a text file, along with their frequencies using grep, sed, or awk? For example, assume the following text file:

155 20 120 156 20 9 157 158 9 40
163 7 95 164 20 9 165 9 40
99 100 20 15 29 101 6 9 40 165
9 22 23 167 168 9 165 171 40

I want to know what are the words that follow 9, and the number of times they occur next to 9. So, the output looks like this:

157 1
40  3
165 2
22  1

Upvotes: 2

Views: 92

Answers (6)

Eugeniu Rosca
Eugeniu Rosca

Reputation: 5315

Try this:

grep -owE "9 [0-9]+" filename | sed "s/^9 //" | sort -n | uniq -c

It returns nearly what you want:

  1 22
  3 40
  1 157
  2 165

Limitation: consecutive 9 digits are not counted with this method.

Upvotes: 3

cr1msonB1ade
cr1msonB1ade

Reputation: 1716

Here is an awk only solution:

awk '{for(i=1; i < NF; i++){
      if($i == 9) nextToNine[$(i+1)]++;}} 
      END{for(j in nextToNine) print j"\t"nextToNine[j]}' test.txt

Upvotes: 1

choroba
choroba

Reputation: 241988

Perl only solution:

perl -ne '$h{$1}++ while /\b9 (\w+)/g }{ print "$_ $h{$_}\n" for keys %h' input.txt

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203995

With GNU awk for multi-char RS:

$ awk -v RS='\\s+' 'p==9{c[$0]++} {p=$0} END{for (w in c) print w, c[w]}' file
165 2
157 1
22 1
40 3

With other awks:

$ awk '{for (i=2;i<=NF;i++) if ($(i-1)==9) c[$i]++} END{for (w in c) print w, c[w]}' file
165 2
157 1
22 1
40 3

Upvotes: 4

Jahid
Jahid

Reputation: 22438

Using Perl regex with grep:

grep -oP "(?<=\b9\s)\d+" file |sort -n|uniq -c

Upvotes: 1

glenn jackman
glenn jackman

Reputation: 247022

With awk, you can write:

awk '
    {
        for (i=1; i<NF; i++) 
            if ($i == 9) 
                follow[$(++i)]++
    } 
    END {
        for (f in follow) 
            print f, follow[f]
    }
' file
22 1
40 3
157 1
165 2

Upvotes: 2

Related Questions