check and return invalid data

Question

I need to check the following data and report the number of rows those do not match a given criterion.

set 582:1960:4c31ed7dea 2012-03-10~23:55:00

set 565:388:13c10fd316 2012-03-10~23:55:00

set 519:348:361189d4b9 extra_text 2012-03-10~23:55:00

set 498:5634:6047172ecc 2012-03-10~23:55:00

set 565:0:bf7a80ee4f 2012-03-10~23:55:00

1) All lines should start with the word "set" and end with " "

2) All lines should have exact 3 number of fields delimited by space.

In the example data, it should return the invalid row count: 2 and preferably the entire line. The third line has an extra word and fifth line does not end correctly.

Kevin · Accepted Answer

awk is good for this. A fairly full-featured script:

#!/usr/bin/awk -f

BEGIN {ends = fields = total = 0 }

NF != 3 || !/\r$/ {
    total++
    if(NF != 3) fields++
    if(!/\r$/) ends++
    print
}

END {
    printf "Wrong number of fields: " fields
    printf "Did not end in a CR: " ends
    printf "Total: " total
}

Short one-liner, only prints offending lines:

awk 'NF != 3 || !/\r$/' file

Prints and counts total:

awk 'NF!=3||!/\r$/{total++} END{print "Total: " total}

check and return invalid data

Answers (2)

Related Questions