process 10 lines of the sample data at a time

Question

I would like to make a loop that will take 10 lines of my input file and output it to an output file. And continue to add lines to the output file not over writing it.

This is a sample data:

FilePath    Filename    Probability ClassifierID    HectorFileType  LibmagicFileType

/mnt/Hector/Data/benign/binary/benign-pete/ 01d0cd964020a1f498c601f9801742c1    19  S040PDFv02  data.pdf    PDF document

/mnt/Hector/Data/benign/binary/benign-pete/ 0299a1771587043b232f760cbedbb5b7    0   S040PDFv02  data.pdf    PDF document

I then use this to count each unique file and show how many of each file there is with:

cut -f 4 input.txt|sort| uniq -c | awk '{print $2, $1}' | sed 1d

So ultimately I just need help making a loop that can run that line of bash and output 10 lines of data at a time to an output file

Hai Vu · Accepted Answer

If I understand correctly, for every block of 10 lines, you are trying to:

Skip the headers, the first line of the block
count how many times field #4 (ClassifierID) occurs and output the field, plus the count.

Here is an AWK script which will do it:

FNR % 10 != 1 {
    ++count[$4]
}

FNR % 10 == 0 {
    for (i in count) {
        print i, count[i]
        delete count[i]
    }
}

Discussion

The FNR % 10 != 1 block processes every line, but lines 1, 11, 21, ... AKA the lines you want to skip. This block keeps a count of field $4
The FNR % 10 == 0 block prints out a summary for that block and resets (via delete) the count
My script does not sort the fields, so the order might be different.
If you want to tally for the whole file, not just block of 10s, then replace FNR % 10 == 0 with END.

process 10 lines of the sample data at a time

Answers (1)

Discussion

Related Questions