Tushar Kesarwani
Tushar Kesarwani

Reputation: 43

Best practice to reduce the complexity of the UNIX Shell script

I have a script which is having too many modules. The script is working perfectly, but the only issue is it is taking too much time and I need to reduce the complexity of the script.

What Script Is Doing :-

I have a source file ( It can have multiple rows of data, each column separated by a '|' ).

My script should check all the rows for all the columns and find the columns which have incorrect data. ( Eg, Null in a not null column, space in a not space column, alphabet's in a number only column, and any value which is not a valid value for that column ).

Now I have a Master file, it is having those column names and the position of the concurrence of that column in the source file with some indicators which will decide what all we have to perform on that column.

E.g., of master file -

MBR_SRC_SYS,15,H,N,N,N,1,FAC|NSC|WGS|ACE|CS9|CHP
CDH_AMT,27,H,1,1,1,N,N

The above is the example of the mater file. So 15 column in my source file will be MBR_SRC_SYS and 'H' Indicates i have to perform HARD ERROR Check. Later is the indicator for the other checks like Not Null, Not Space, Number Format, and Valid Value and the last contains the valid values this particular column can have.

Format of master file -

Clm Name,Position,HardErrorCheck,NullCheck,SpaceCheck,NumberCheck,ValidValueCheck,ValidValues

Eg of source file -

DTL|CLM| |RMB_CLM_2015_V01|RMBFCSNSC|15135NSC|rmb_fcsdmsclm_n.dat|rmb_fcsdmsclm_n.trg|NSC  2015093QA01109920150514 4 3132PD|WFS|2015093QA011099|2015/05/14 04:31:32|131|99|NAP||27002|6MB09|          | |2014/10/04 00:00:00|2015/05/15 00:00:00|2015/05/15 00:00:00|2015/05/15 00:00:00|2015/05/15 00:00:00|97.26|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0|0|WIFE                           |A|SMITH                          |603T40775           |270022222           |603T40775           |20       |F|1966/12/02 00:00:00|SUB|MD|SN|G2001|NTINCR|MFFFF| |91|91| |NSC|NAIIH11X|H| | | | | | | | | | | | |2015/05/15 00:00:00| | | | | | | | | | | | | | |W|D|N|P|PF|OP|N|CHK|0001025000| 7021693440|351159676      |Y| |I~IN~ ~ ~ ~ | ~ ~ | | | | | | | | |0.00|0.00|0.00| | | | | | | | | | | | | | | | | 

So, what I am doing is having 2 while loops. With one loop I am reading the lines of the source file. Now inside this while loop after reading i am having another while loop to read the Master File and do all the necessary checks.

So suppose my source file is having 500 records wich 100 columns. Total loop count will be 500*100 = 50000 Itterations. This is too much, Because my source file can have records like 5 Millions.

I am new to Unix, so i am not sure, but scripting languages like PHP have a timeout and will definitely not be able to run this long. So what should be the solution for this ?

Below is the complete code i have -

#Log file creation

#exec 1> $CODE/WCC_FOA_RMBHUB/logs/RMB_HARD_ERROR_CHECK_ROWMAJOR_$(date +"%Y%m%d_%H%M%S").log 2>&1

SrcFilePath=$PMDIR/SrcFiles/WCC_FOA_RMBHUB
ListFilePath=$CODE/WCC_FOA_RMBHUB/lists

SrcFileName=$1
HardErrorFile=HARD_ERR_RMB_TEST_FILE.dat
MasterFile=$2

NullCheck()
{
value="$1"

if [ ! -n "$value" ] ; then

    return 10
else
    return 1

fi
}

SpaceCheck()
{
value="$1"
NullCheck "$value"
t=$?
if [ "$t" != "10" ]; then

value=$(echo $value)

    if [ ! -z "$value" -a "$value" != " " ]; then
        return 1
    else
        return 10
    fi
else
    return 1
fi

}

NumberCheck()
{
value="$1"
if [ $value -eq $value 2>/dev/null ]; then
    return 1
else
    return 10
fi

}

ValidCheck()
{
value="$1"


# Replace space with 'SPACE'

SpaceCheck "$value"
t1=$?
if [ "$t1" == "10" ]; then
    value="SPACE"
fi

ValidFromMapping="$2"
needle="|"
NumberOfDelimeter=$(echo $ValidFromMapping |  awk 'BEGIN{FS="|"} {print NF}')
v=1
vf=0

while [ $NumberOfDelimeter -gt 0 ]
do

    #echo YES
    ValidValue=$(echo "$ValidFromMapping" | cut -d'|' -f$v)
    #echo "Checking with the value $ValidValue."
    if [ "$ValidValue" == "$value" ]; then
        vf=1    
    fi

    v=$(expr $v + 1)
    NumberOfDelimeter=$(expr $NumberOfDelimeter - 1)

done

if [ $vf -eq 1 ]; then
    #echo "Match Found"
    return 1
else
    #echo "No Match Found"
    return 10
fi

}
SoftErrThreshHold=0

        #echo "Errrr... We got a hard error check column. We have to check $ColumnName in the whole god dam source table. This column is available in column $ColNum .We got to do it brah !!!\n"
        while read RowOne
        do
            HardErrorFlag=1
            SoftErrorFlag=1
            RecordColumn=$(echo "$RowOne" | cut -d'|' -f1)

            #echo "Started New Row"

            if [ "$RecordColumn" == "DTL" ]; then

                        KeyColumn=$(echo "$RowOne" | cut -d'|' -f9)
                        ErrorText="$KeyColumn |HARD ERROR OCCURRED IN THE FOLLOWING COLUMN(S):"
                        SoftErrorText="$KeyColumn |SOFT ERROR OCCURRED IN THE FOLLOWING COLUMN(S):"
                        while read line
                        do
                            ColumnName=`echo $line | cut -d',' -f1`
                            ColNum=`echo $line | cut -d',' -f2`
                            ErrorCheck=`echo $line | cut -d',' -f3`


                            NullCheckStatus=`echo $line | cut -d',' -f4`
                            SpaceCheckStatus=`echo $line | cut -d',' -f5`
                            NumberCheckStatus=`echo $line | cut -d',' -f6`
                            ValidCheckStatus=`echo $line | cut -d',' -f7`
                            ValidValues=`echo $line | cut -d',' -f8`



                            # Hard Error Check - 

                            if [ "$ErrorCheck" == "H" ]; then

                                #echo "Doing Hard Error"
                                IsNull=1;
                                IsNumber=1;
                                IsSpace=1;
                                IsValid=1;

                                HardErrorCheckColumnValue=$(echo "$RowOne" | cut -d'|' -f$ColNum)
                               #echo "Value is $HardErrorCheckColumnValue."

                                #Funtions Return 10 if it is NULL/SPACE/NOT A NUMBER

                                #echo $NullCheckStatus
                                if [ "$NullCheckStatus" == "1" ]; then 
                                    NullCheck "$HardErrorCheckColumnValue"
                                    IsNull=$?
                                fi

                                #echo $NumberCheckStatus
                                if [ "$NumberCheckStatus" == "1" ]; then 
                                    NumberCheck "$HardErrorCheckColumnValue"
                                    IsNumber=$?

                                    NullCheck "$HardErrorCheckColumnValue"
                                    IsNull=$?

                                    SpaceCheck "$HardErrorCheckColumnValue"
                                    IsSpace=$?

                                fi

                                #echo $SpaceCheckStatus
                                if [ "$SpaceCheckStatus" == "1" ]; then 
                                    SpaceCheck "$HardErrorCheckColumnValue"
                                    IsSpace=$?
                                fi

                                #echo $ValidCheckStatus

                                if [ "$ValidCheckStatus" == "1" ]; then
                                    ValidCheck "$HardErrorCheckColumnValue" "$ValidValues"
                                    IsValid=$?                              
                                fi

                                if [ $IsNull -eq 10 ] || [ $IsNumber -eq 10 ] || [ $IsSpace -eq 10 ] || [ $IsValid -eq 10 ]; then

                                    HardErrorFlag=10

                                    if [ $IsNull -eq 10 ]; then
                                        ErrorText=$ErrorText" $ColumnName is Null or Invalid; "
                                    fi
                                    if [ $IsNumber -eq 10 ]; then

                                        ErrorText=$ErrorText" $ColumnName is not a valid number; "
                                    fi
                                    if [ $IsSpace -eq 10 ]; then
                                        ErrorText=$ErrorText" $ColumnName is Space or Invalid; "
                                    fi

                                    if [ $IsValid -eq 10 ]; then
                                        ErrorText=$ErrorText" $ColumnName is not a valid value; "
                                    fi

                                fi  
                            fi

                            # Soft Error Check -

                            if [ "$ErrorCheck" == "S" ]; then
                                #echo "Doing Soft Error"
                                IsSValid=1
                                IsSNull=1;
                                IsSNumber=1;
                                IsSSpace=1;


                                SoftErrorCheckColumnValue=$(echo "$RowOne" | cut -d'|' -f$ColNum)
                                #echo "Value is $SoftErrorCheckColumnValue."

                                #Funtions Return 10 if it is NULL/SPACE/NOT A NUMBER

                                #echo $NullCheckStatus
                                if [ "$NullCheckStatus" == "1" ]; then 
                                    NullCheck "$SoftErrorCheckColumnValue"
                                    IsSNull=$?
                                fi

                                #echo $NumberCheckStatus
                                if [ "$NumberCheckStatus" == "1" ]; then 
                                    NumberCheck "$SoftErrorCheckColumnValue"
                                    IsSNumber=$?

                                    NullCheck "$SoftErrorCheckColumnValue"
                                    IsSNull=$?

                                    SpaceCheck "$SoftErrorCheckColumnValue"
                                    IsSSpace=$?

                                fi

                                #echo $SpaceCheckStatus
                                if [ "$SpaceCheckStatus" == "1" ]; then 
                                    SpaceCheck "$SoftErrorCheckColumnValue"
                                    IsSpace=$?
                                fi


                                #echo $ValidCheckStatus

                                if [ "$ValidCheckStatus" == "1" ]; then
                                    ValidCheck "$SoftErrorCheckColumnValue" "$ValidValues"
                                    IsSValid=$?
                                fi

                                if [ $IsSNull -eq 10 ] || [ $IsSNumber -eq 10 ] || [ $IsSSpace -eq 10 ] || [ $IsSValid -eq 10 ]; then

                                    SoftErrorFlag=10


                                    if [ $IsSNull -eq 10 ]; then
                                        SoftErrorText=$SoftErrorText" $ColumnName is Null or Invalid; "
                                    fi
                                    if [ $IsSNumber -eq 10 ]; then

                                        SoftErrorText=$SoftErrorText" $ColumnName is not a valid number; "
                                    fi
                                    if [ $IsSSpace -eq 10 ]; then
                                        SoftErrorText=$SoftErrorText" $ColumnName is Space or Invalid; "
                                    fi

                                    if [ $IsSValid -eq 10 ]; then
                                        SoftErrorText=$SoftErrorText" $ColumnName is not a valid value; "

                                    fi

                                    SoftErrThreshHold=$(expr $SoftErrThreshHold + 1)
                                fi

                            fi

                        done <$ListFilePath/$MasterFile

                        if [ $HardErrorFlag -eq 10 ]; then
                            echo $ErrorText

                        fi

                        if [ $SoftErrorFlag -eq 10 ]; then
                            echo $SoftErrorText

                        fi
            fi
        done <$SrcFilePath/$SrcFileName

if [ $SoftErrThreshHold -eq 0 ]; then
        echo "No Soft Error was found in the file"
else
        echo "Total Count Of SofErrors Found : $SoftErrThreshHold"
fi

Looking forward for a reply.

Upvotes: 0

Views: 212

Answers (1)

glenn jackman
glenn jackman

Reputation: 247012

This question is more appropriate to a site like https://codereview.stackexchange.com/.

The main problem is that you're reading the master file for every line of the source file. Totally unnecessary. You should read it once, store its information, then loop over the source file. Fix that, and you'll see a noticable improvement.

However, if performance is the main concern, shell is the wrong tool for the job. You're making multiple calls to external tools like awk, cut, expr. The shell can do arithmetic and split a string on a delimiter. I'd rewrite the whole thing for awk or perl that have all of the text processing function you require before I'd spend much time trying to make the shell program more efficient.


A particular shell improvement: split a string on commas and save the fields: Do this

IFS=',' read ColumnName ColNum ErrorCheck NullCheckStatus SpaceCheckStatus NumberCheckStatus ValidCheckStatus ValidValues <<< "$line"

instead of this:

ColumnName=`echo $line | cut -d',' -f1`
ColNum=`echo $line | cut -d',' -f2`
ErrorCheck=`echo $line | cut -d',' -f3`
NullCheckStatus=`echo $line | cut -d',' -f4`
SpaceCheckStatus=`echo $line | cut -d',' -f5`
NumberCheckStatus=`echo $line | cut -d',' -f6`
ValidCheckStatus=`echo $line | cut -d',' -f7`
ValidValues=`echo $line | cut -d',' -f8`

How to read Master file:

# not strictly necessary: declare some array variables
declare -a colname errchk nullchk spacechk numchk validchk values

while IFS=',' read -r ColumnName ColNum ErrorCheck NullCheck SpaceCheck NumberCheck ValidCheck ValidValues
do
    colname[$ColNum]=$ColumnName
    errchk[$ColNum]=$ErrorCheck
    nullchk[$ColNum]=$NullCheck
    spacechk[$ColNum]=$SpaceCheck
    numchk[$ColNum]=$NumberCheck
    validchk[$ColNum]=$ValidCheck
    values[$ColNum]=$ValidValues
done < MasterFile

Then read the source file:

while IFS='|' read -ra fields; do

    for (( i=0; i < ${#fields[@]}; i++ )); do
        if [[ -n ${colname[$i]} ]]; then 
            # there was an entry for this column number in the Master file
            # perform your checks for column number $i
            # the column value is "${fields[$i]}"
        fi
    done

    # check your hard/soft error flags here

done < SourceFile

Upvotes: 2

Related Questions