nobody-at-all
nobody-at-all

Reputation: 43

List unique and count the first numberic match after a string

I would like to list each unique number after a specific string and a count of how many times this number occurs in a file. The specific string would be 'length' and the first number after it.

Current data for example:

*random string* length: 40
*random string* length: 54
*random string* length: 40
*random string* length: 60
*random string* length: 60
*random string* length: 60

Ideal result would be:

length 40: 2
length 54: 1
length 60: 3

At the moment I am scripting a count of each number I manually see in a 24,111 line file which isn't practical.

cat file.txt | awk '/length: 60/ {total++} END {print total}'

Upvotes: 2

Views: 86

Answers (3)

potong
potong

Reputation: 58463

This might work for you (GNU sed):

sed -E 's/.* (\S+:.*)/\1 1/;H;x;s/(\n[^:]*: \S+ )(\S+)(.*)\1.*$/\1$((\2+1))\3/
        x;$!d;x;s/.(.*)/echo "\1"/e' file

Remove junk from the start of each line and add a counter.

Append the amended line to the hold space and increment the counter if the line already exists (removing the duplicate) and then delete the current line.

At the end of file, swap to the hold space, remove the introduced newline at the start of the hold space, create an echo command and evaluate it.

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133590

With your shown samples, please try following awk code.

awk '
match($0,/length: [0-9]+/){
  cnts[substr($0,RSTART,RLENGTH)]++
}
END{
  for(key in cnts){
    print key": "cnts[key]
  }
}
' Input_file

Explanation: Using match function to match string length: digits in all lines then creating array with 2nd field and keep adding its value to same index. In END block of code printing index and value of array as per shown required output.

Upvotes: 5

Ed Morton
Ed Morton

Reputation: 203807

If you don't care where the count appears in the output:

$ sed 's/.*\(length\):/\1/' file | sort | uniq -c
      2 length 40
      1 length 54
      3 length 60

or if you need exactly the output format in your question:

$ sed 's/.*\(length\):/\1/' file | sort | uniq -c | awk '{print $2, $3":", $1}'
length 40: 2
length 54: 1
length 60: 3

Upvotes: 2

Related Questions