Linux command or script counting duplicated bunch of lines in a text file?

Question

I am looking for something like this, but instead of counting the number of duplicated lines I would need to count the number of duplicated bunch of lines.

For the sake of clarification, I have a file like this:

Separator
line11
line12
line13
Separator
line21
line22
line23
Separator
line11
line12
line13
Separator
line11
line12
line13
Separator
line31
line32
line33
Separator
line21
line22
line23

And I would excpect an output as follows

3:    Separator
      line11
      line12
      line13
2:    Separator
      line21
      line22
      line23
1:   Separator
      line31
      line32
      line33

Where: 3:,2: and 1: means the number of times each bunch of lines appears in the file.

I tried without success the following command:

sort all_lits.txt | uniq -c

and currently I am writing an awk command in order to obtain the information but nothing clear yet. As soon as I get some command to show I am going to publish it.

Is it possible to get this information using some combination of UNIX tools such as awk, grep, wc, sort. ect.?

I do know I could write a script to do it but I would like to avoid to do so. In the extreme case I will do.

Any help is going to be highly appreciated.

glenn jackman · Accepted Answer

awk -v RS=Separator '
    NR>1 {count[$0]++}
    END {for (bunch in count) print count[bunch], RS, bunch}
' file

1 Separator 
line31
line32
line33

2 Separator 
line21
line22
line23

3 Separator 
line11
line12
line13

There is no inherent order to the output. If you want sorted by count descending, and you're using GNU AWK:

awk -v RS=Separator '
    NR>1 {count[$0]++}
    END {
        PROCINFO["sorted_in"] = "@val_num_desc"
        for (bunch in count) print count[bunch], RS, bunch
    }
' file

Linux command or script counting duplicated bunch of lines in a text file?

Answers (2)

Related Questions