Reputation: 5291
Is there a clean way to get a list of words that follow a particular pattern in a text file, along with their frequencies using grep
, sed
, or awk
? For example, assume the following text file:
155 20 120 156 20 9 157 158 9 40
163 7 95 164 20 9 165 9 40
99 100 20 15 29 101 6 9 40 165
9 22 23 167 168 9 165 171 40
I want to know what are the words that follow 9, and the number of times they occur next to 9. So, the output looks like this:
157 1
40 3
165 2
22 1
Upvotes: 2
Views: 92
Reputation: 5315
Try this:
grep -owE "9 [0-9]+" filename | sed "s/^9 //" | sort -n | uniq -c
It returns nearly what you want:
1 22
3 40
1 157
2 165
Limitation: consecutive 9
digits are not counted with this method.
Upvotes: 3
Reputation: 1716
Here is an awk only solution:
awk '{for(i=1; i < NF; i++){
if($i == 9) nextToNine[$(i+1)]++;}}
END{for(j in nextToNine) print j"\t"nextToNine[j]}' test.txt
Upvotes: 1
Reputation: 241988
Perl only solution:
perl -ne '$h{$1}++ while /\b9 (\w+)/g }{ print "$_ $h{$_}\n" for keys %h' input.txt
Upvotes: 1
Reputation: 203995
With GNU awk for multi-char RS:
$ awk -v RS='\\s+' 'p==9{c[$0]++} {p=$0} END{for (w in c) print w, c[w]}' file
165 2
157 1
22 1
40 3
With other awks:
$ awk '{for (i=2;i<=NF;i++) if ($(i-1)==9) c[$i]++} END{for (w in c) print w, c[w]}' file
165 2
157 1
22 1
40 3
Upvotes: 4
Reputation: 22438
Using Perl regex with grep
:
grep -oP "(?<=\b9\s)\d+" file |sort -n|uniq -c
Upvotes: 1
Reputation: 247022
With awk, you can write:
awk '
{
for (i=1; i<NF; i++)
if ($i == 9)
follow[$(++i)]++
}
END {
for (f in follow)
print f, follow[f]
}
' file
22 1
40 3
157 1
165 2
Upvotes: 2