Shreyas P V
Shreyas P V

Reputation: 133

awk - match strings containing characters in both upper & lower case and blank spaces

I have an input file in .csv format which contains entries of tax invoices separated by comma.

for example:

Header--TIN | NAME | INV NO | DATE | NET | TAX | OTHERS | TOTAL
Record1-29001234768 | A S Spares | AB012 | 23/07/2016 | 5600 | 200 | 10 | 5810
Record2-29450956221 | HONDA Spare Parts | HOSS0987 |29/09/2016 | 70000 | 2200 | 0 | 72200

My aim is to process these records using 'AWK'. My requirements-

1) I need to check the 'NAME' field for special characters and digits (i.e it should only be an alphabetical string) and the length of string (including Spaces) in the 'NAME' field should not exceed 30. If the above conditions are not satisfied I should report error to the user by printing the error records only

2) I need to check the 'INV NO' field for special characters including blank spaces (INV NO is an alphanumeric field). I also need to check the length of the contents of this field and it should not exceed 15.

Can anyone please give me the regular expression to match these above requirements and also the procedure of how to implement it.

Upvotes: 1

Views: 1051

Answers (2)

Neil McGuigan
Neil McGuigan

Reputation: 48256

Something like:

awk -f check.awk input.csv

where check.awk is:

BEGIN {
  FS=","  # the input field separator
}

# skip the header (NR>1), check regex for field 2, check length of field 2
NR>1 && $2 ~ /[^a-zA-Z ]/ || length($2)>30 {print "error w NAME "$1}

# skip the header (NR>1), check regex for field 3, check length of field 3
NR>1 && $3 ~ /[^0-9a-zA-Z]/ || length($3)>15 {print "error with INV NO "$1}

If you use gawk you can use the IGNORECASE global and use case-insensitive regexs

Upvotes: 2

redneb
redneb

Reputation: 23870

If your system has a modern grep (i.e. one that supports the -P option) then it I think would be easier to solve this using grep, e.g. like this:

grep -viP '^[^|]* \| [a-z0-9 ]{0,30} \| [a-z0-9]{0,15} \|' file.txt

The above command should print all lines that do not satisfy your requirements.

Upvotes: 0

Related Questions