Purple_haze
Purple_haze

Reputation: 68

a modification of this script in shell

I have this script that needs to READ ALL fields in a coulmn and validate before it can hit the second column for example

Name, City

Joe, Orlando
Sam, Copper Town
Mike, Atlanta

so the script should read the entire column of name(top to bottom) and validate for null before it moves to the second column. IT should NOT read line by line . Please add some pointer on how to modify /correct

 # Read all files.  no file have spaces in their names


for file in /export/home/*.csv ; do
  # init two variables before processing a new file
 $date_regex = '~(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d~';
 FILESTATUS=GOOD
 FIRSTROW=true
# process file 1 line a time, splitting the line by the
# Internal Field Sep ,
 cat "${file}" | while IFS=, read field1 field2 field3 field4; do
  # Skip first line, the header row

  if [ "${FIRSTROW}" = "true" ]; then
     FIRSTROW=FALSE
     # skip processing of this line, continue with next record
     continue;
  fi

  #different validations
  if [[ ! -n "$field1" ]]; then
  ${FILESTATUS}=BAD
     # Stop inner loop
     break
  fi
  #somecheckonField2
      if [[ ! -n "$field2"]]  && ("$field2" =~ $date_regex) ; then
     ${FILESTATUS}=BAD
     # Stop inner loop
     break
  fi

      if [[ ! -n "$field3" ]] && (("$field3" != "S") || ("$field3" != "E")); then
     ${FILESTATUS}=BAD
     # Stop inner loop
     break
  fi

      if [[ ! -n "$field4" ]] || (( ${#field4} < 9 || ${#field4} > 11 )); then
     ${FILESTATUS}=BAD
     # Stop inner loop
     break
  fi


done

 if [ ${FILESTATUS} = "GOOD" ] ; then

  mv ${file} /export/home/goodFile


 else
  mv ${file} /export/home/badFile
fi

done

Upvotes: 0

Views: 75

Answers (2)

glenn jackman
glenn jackman

Reputation: 246942

This awk will read the whole file, then you can do your verification in the END block:

for file in /export/home/*.csv ; do
    awk -F', ' '
        # skip the header and blank lines
        NR == 1 || NF == 0 {next}

        # save the data
        { for (i=1; i<=NF; i++) data[++nr,i] = $i }

        END {
            status = "OK"

            # verify column 1
            for (lineno=1; lineno <= nr; lineno++) {
                if (length(data[lineno,1]) == 0) {
                    status = "BAD" 
                    break
                }
            }
            printf "file: %s, verify column 1, status: %s\n", FILENAME, status

            # verify other columns ...
        }
    ' "$file"
done

Upvotes: 1

twalberg
twalberg

Reputation: 62399

Here's an attempt at an awk script that does what it seems like the original script is trying to do:

#!/usr/bin/awk -f

# fields separated by commas
BEGIN { FS = "," }

# skip first line
NR == 1 { next }

# check for empty fields
$1 == "" || $2 == "" || $3 == "" || $4 == "" { exit 1 }

# check for "valid" date (urk... doing this with a regex is horrid)
# it would be better to split it into components and validate each sub-field,
# but I'll leave that as a learning exercise for the reader
$2 !~ /^(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)[0-9][0-9]$/ { exit 1 }

# third field should be either S or E
$3 !~ /^[SE]$/ { exit 1 }

# check the length of the fourth field is between 9 and 11
length($4) < 9 || length($4) > 11 { exit 1 }

# if we haven't found problems up to here, then things are good
END { exit 0 }

Save that in e.g. validate.awk, and set the executable bit on it (chmod +x validate.awk), then you can simply do:

if validate.awk < somefile.txt
then
  mv somefile.txt goodfiles/
else
  mv somefile.txt badfiles/
fi

Upvotes: 1

Related Questions