Liu Will
Liu Will

Reputation: 79

how to count repeated sentence in Shell

cat file1.txt
abc bcd abc ...
abcd bcde cdef ...
abcd bcde cdef ...
abcd bcde cdef ...
efg fgh ...
efg fgh ...
hig ...

My expected result is like as below:

abc bcd abc ...      

abcd bcde cdef ...  
<!!! pay attention, above sentence has repeated 3 times !!!>

efg fgh ...
<!!! pay attention, above sentence has repeated 3 times !!!>

hig ...

I have found a way to deal with the issues, but my code is a little noisy.

cat file1.txt | uniq -c | sed -e 's/ \+/ /g' -e 's/^.//g' | awk '{print $0," ",$1}'| sed -e 's/^[2-9] /\n/g' -e 's/^[1] //g' |sed -e 's/[^1]$/\n<!!! pay attention, above sentence has repeated & times !!!> \n/g' -e 's/[1]$//g'

abc bcd abc ...

abcd bcde cdef ...
<!!! pay attention, above sentence has repeated 3 times !!!>

efg fgh ...
<!!! pay attention, above sentence has repeated 2 times !!!>

hig ...

I was wondering if you could show me more high-efficiency way to achieve the goal.Thanks a lot.

Upvotes: 1

Views: 73

Answers (3)

Ed Morton
Ed Morton

Reputation: 203899

$ awk '
    $0==prev { cnt++; next }
    { prt(); prev=$0; cnt=1 }
    END { prt() }
    function prt() {
        if (NR>1) print prev (cnt>1 ? ORS "repeated " cnt " times" : "") ORS
    }
' file
abc bcd abc ...

abcd bcde cdef ...
repeated 3 times

efg fgh ...
repeated 2 times

hig ...

Upvotes: 2

glenn jackman
glenn jackman

Reputation: 247002

If you're lines are not already grouped, then you could use

awk '
    NR == FNR {count[$0]++; next} 
    !seen[$0]++ {
        print
        if (count[$0] > 1)
            print "... repeated", count[$0], "times"
    }
' file1.txt file1.txt

This will consume a lot of memory if your file is very large. You might want to sort it first.

Upvotes: 1

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

sort + uniq + sed solution:

sort file1.txt | uniq -c | sed -E 's/^ +1 (.+)/\1\n/; 
 s/^ +([2-9]|[0-9]{2,}) (.+)/\2\n<!!! pay attention, the above sentence has repeated \1 times !!!>\n/'

The output:

abc bcd abc ...

abcd bcde cdef ...
<!!! pay attention, the above sentence has repeated 3 times !!!>

efg fgh ...
<!!! pay attention, the above sentence has repeated 2 times !!!>

hig ...

Or with awk:

sort file1.txt | uniq -c | awk '{ n=$1; sub(/^ +[0-9]+ +/,""); 
printf "%s\n%s",$0,(n==1? ORS:"<!!! pay attention, the above sentence has repeated "n" times !!!>\n\n") }'

Upvotes: 2

Related Questions