Reputation: 43
I would like to list each unique number after a specific string and a count of how many times this number occurs in a file. The specific string would be 'length' and the first number after it.
Current data for example:
*random string* length: 40
*random string* length: 54
*random string* length: 40
*random string* length: 60
*random string* length: 60
*random string* length: 60
Ideal result would be:
length 40: 2
length 54: 1
length 60: 3
At the moment I am scripting a count of each number I manually see in a 24,111 line file which isn't practical.
cat file.txt | awk '/length: 60/ {total++} END {print total}'
Upvotes: 2
Views: 86
Reputation: 58463
This might work for you (GNU sed):
sed -E 's/.* (\S+:.*)/\1 1/;H;x;s/(\n[^:]*: \S+ )(\S+)(.*)\1.*$/\1$((\2+1))\3/
x;$!d;x;s/.(.*)/echo "\1"/e' file
Remove junk from the start of each line and add a counter.
Append the amended line to the hold space and increment the counter if the line already exists (removing the duplicate) and then delete the current line.
At the end of file, swap to the hold space, remove the introduced newline at the start of the hold space, create an echo command and evaluate it.
Upvotes: 1
Reputation: 133590
With your shown samples, please try following awk
code.
awk '
match($0,/length: [0-9]+/){
cnts[substr($0,RSTART,RLENGTH)]++
}
END{
for(key in cnts){
print key": "cnts[key]
}
}
' Input_file
Explanation: Using match
function to match string length: digits
in all lines then creating array with 2nd field and keep adding its value to same index. In END
block of code printing index and value of array as per shown required output.
Upvotes: 5
Reputation: 203807
If you don't care where the count appears in the output:
$ sed 's/.*\(length\):/\1/' file | sort | uniq -c
2 length 40
1 length 54
3 length 60
or if you need exactly the output format in your question:
$ sed 's/.*\(length\):/\1/' file | sort | uniq -c | awk '{print $2, $3":", $1}'
length 40: 2
length 54: 1
length 60: 3
Upvotes: 2