shantanuo
shantanuo

Reputation: 32276

check and return invalid data

I need to check the following data and report the number of rows those do not match a given criterion.

set 582:1960:4c31ed7dea 2012-03-10~23:55:00\r\n
set 565:388:13c10fd316 2012-03-10~23:55:00\r\n
set 519:348:361189d4b9 extra_text 2012-03-10~23:55:00\r\n
set 498:5634:6047172ecc 2012-03-10~23:55:00\r\n
set 565:0:bf7a80ee4f 2012-03-10~23:55:00

1) All lines should start with the word "set" and end with "\r\n"

2) All lines should have exact 3 number of fields delimited by space.

In the example data, it should return the invalid row count: 2 and preferably the entire line. The third line has an extra word and fifth line does not end correctly.

Upvotes: 0

Views: 55

Answers (2)

Kevin
Kevin

Reputation: 56129

awk is good for this. A fairly full-featured script:

#!/usr/bin/awk -f

BEGIN {ends = fields = total = 0 }

NF != 3 || !/\r$/ {
    total++
    if(NF != 3) fields++
    if(!/\r$/) ends++
    print
}

END {
    printf "Wrong number of fields: " fields
    printf "Did not end in a CR: " ends
    printf "Total: " total
}

Short one-liner, only prints offending lines:

awk 'NF != 3 || !/\r$/' file

Prints and counts total:

awk 'NF!=3||!/\r$/{total++} END{print "Total: " total}

Upvotes: 1

ruakh
ruakh

Reputation: 183582

To print the invalid rows:

grep -v '^set [^ ][^ ]* [^ ][^ ]*\\r\\n$' FILENAME

To print the number of invalid rows:

grep -cv '^set [^ ][^ ]* [^ ][^ ]*\\r\\n$' FILENAME

Upvotes: 1

Related Questions