awk - match strings containing characters in both upper & lower case and blank spaces

Question

I have an input file in .csv format which contains entries of tax invoices separated by comma.

for example:

Header--TIN | NAME | INV NO | DATE | NET | TAX | OTHERS | TOTAL
Record1-29001234768 | A S Spares | AB012 | 23/07/2016 | 5600 | 200 | 10 | 5810
Record2-29450956221 | HONDA Spare Parts | HOSS0987 |29/09/2016 | 70000 | 2200 | 0 | 72200

My aim is to process these records using 'AWK'. My requirements-

1) I need to check the 'NAME' field for special characters and digits (i.e it should only be an alphabetical string) and the length of string (including Spaces) in the 'NAME' field should not exceed 30. If the above conditions are not satisfied I should report error to the user by printing the error records only

2) I need to check the 'INV NO' field for special characters including blank spaces (INV NO is an alphanumeric field). I also need to check the length of the contents of this field and it should not exceed 15.

Can anyone please give me the regular expression to match these above requirements and also the procedure of how to implement it.

Neil McGuigan · Accepted Answer

Something like:

awk -f check.awk input.csv

where check.awk is:

BEGIN {
  FS=","  # the input field separator
}

# skip the header (NR>1), check regex for field 2, check length of field 2
NR>1 && $2 ~ /[^a-zA-Z ]/ || length($2)>30 {print "error w NAME "$1}

# skip the header (NR>1), check regex for field 3, check length of field 3
NR>1 && $3 ~ /[^0-9a-zA-Z]/ || length($3)>15 {print "error with INV NO "$1}

If you use gawk you can use the IGNORECASE global and use case-insensitive regexs

awk - match strings containing characters in both upper & lower case and blank spaces

Answers (2)

Related Questions

awk - match strings containing characters in both upper &amp; lower case and blank spaces

Answers (2)

Related Questions

awk - match strings containing characters in both upper & lower case and blank spaces