Reputation: 133
I have an input file in .csv format which contains entries of tax invoices separated by comma.
for example:
Header--TIN | NAME | INV NO | DATE | NET | TAX | OTHERS | TOTAL
Record1-29001234768 | A S Spares | AB012 | 23/07/2016 | 5600 | 200 | 10 | 5810
Record2-29450956221 | HONDA Spare Parts | HOSS0987 |29/09/2016 | 70000 | 2200 | 0 | 72200
My aim is to process these records using 'AWK'. My requirements-
1) I need to check the 'NAME' field for special characters and digits (i.e it should only be an alphabetical string) and the length of string (including Spaces) in the 'NAME' field should not exceed 30. If the above conditions are not satisfied I should report error to the user by printing the error records only
2) I need to check the 'INV NO' field for special characters including blank spaces (INV NO is an alphanumeric field). I also need to check the length of the contents of this field and it should not exceed 15.
Can anyone please give me the regular expression to match these above requirements and also the procedure of how to implement it.
Upvotes: 1
Views: 1051
Reputation: 48256
Something like:
awk -f check.awk input.csv
where check.awk
is:
BEGIN {
FS="," # the input field separator
}
# skip the header (NR>1), check regex for field 2, check length of field 2
NR>1 && $2 ~ /[^a-zA-Z ]/ || length($2)>30 {print "error w NAME "$1}
# skip the header (NR>1), check regex for field 3, check length of field 3
NR>1 && $3 ~ /[^0-9a-zA-Z]/ || length($3)>15 {print "error with INV NO "$1}
If you use gawk
you can use the IGNORECASE
global and use case-insensitive regexs
Upvotes: 2
Reputation: 23870
If your system has a modern grep
(i.e. one that supports the -P
option) then it I think would be easier to solve this using grep
, e.g. like this:
grep -viP '^[^|]* \| [a-z0-9 ]{0,30} \| [a-z0-9]{0,15} \|' file.txt
The above command should print all lines that do not satisfy your requirements.
Upvotes: 0