Reputation: 439
I have written a script to subset some large flat files, with the integer number of subsets (i.e. $increment) as a user input variable. I am observing a strange behavior (bash syntax error) only when this input argument is an odd integer. For the purpose of (marginally) improved clarity, I have replicated this error behavior in a stripped down version of my original shell script. I am unable to provide the flat files, but hopefully somebody with shell/bash expertise can diagnose what's going on here just by looking at the code and error messages (I already ran this through http://www.shellcheck.net/ and didn't find any serious problems).
When $increment is set to a even integer (e.g. 8), the shell script executes free of errors and outputs the desired print statements (see "NOTE" below) for every iteration of the while loop. Here is some sample output from those print statements:
Line of interest: span2=84688
Line of interest: span2=85225
Line of interest: span2=86323
...
When $increment is odd however (e.g. 9), the script fails at line 48 "span2=$(($line2-$last2))" with the error statement:
test_case.sh: line 48: 153026
153027-77419: syntax error in expression (error token is "153027-77419")
This is strange because the preceding echo print statement output "Line of interest: span2=75278" indicates that the arithmetic expression is evaluating in the subshell without errors, just prior to the failing line. So obviously there is nothing particularly special about the integers that are being subtracted here, but it is odd that the error message, for example, appears to be off-by-one when it outputs "153026" when the expression argument $line2 equals "153027". I'm not sure if/how this is related to the syntax error, though.
#!/bin/bash
set -e
increment=9
file1="path/to/file1"
file2="path/to/file2"
file3="path/to/file3"
# End index of header in first file
file1_start=2138
midpoint=$(( $file1_start + 1 ))
file1_wc=($(wc $file1))
file2_wc=($(wc $file2))
file3_wc=($(wc $file3))
# Get a line count for the three different flat text files, as an upper bound index
ceil1=${file1_wc[0]}
ceil2=${file2_wc[0]}
ceil3=${file3_wc[0]}
# Initialize end point indices
line="$(head -$midpoint $file1 | tail -1 | awk '{print $1;}')"
line2=$(grep -n -e "$line" $file2 | cut -f1 -d:)
line3=$(grep -n -e "$line" $file3 | cut -f1 -d:)
# Initialize starting point indices
last1=$midpoint
last2=$line2
last3=$line3
# Update "midpoint" index
midpoint=$(($midpoint+$ceil1/$increment))
while [ $midpoint -lt $ceil1 ]
do
line="$(head -$midpoint $file1 | tail -1 | awk '{print $1;}')"
line2=$(grep -n -e "$line" $file2 | cut -f1 -d:)
line3=$(grep -n -e "$line" $file3 | cut -f1 -d:)
# Calculate range of indices for subset number $increment
span1=$(($midpoint-$last1))
echo "Line of interest: span2=$(($line2-$last2))"
# ***NOTE***: The below statement is where it is failing for odd $increment
span2=$(($line2-$last2))
span3=$(($line3-$last3))
# Set index variables for next iteration of file traversal
index=$(($index+1))
last1=$midpoint
last2=$line2
last3=$line3
# Increment midpoint index variable
midpoint=$(($midpoint+$ceil1/$increment))
done
Your feedback is much appreciated, thanks in advance.
UPDATE: By adding "set -x" and looking at the call stack, I determined that the expression
line2=$(grep -n -e "$line" $file2 | cut -f1 -d:)
was greping more than one line. Thus, in the example I provided above, $line2 was equal to "153026\n153027", and wasn't a sound argument for subtraction, hence the syntax error. A way to resolve this is to pipe to head, e.g.
line2=$(grep -n -e "$line" $file2 | cut -f1 -d: | head -1)
to only consider the first line yielded by grep.
Upvotes: 0
Views: 2423
Reputation: 19395
ncemami: By adding "set -x" and looking at the call stack, I determined that the expression
line2=$(grep -n -e "$line" $file2 | cut -f1 -d:)
was greping more than one line. Thus, in the example I provided above, $line2 was equal to "153026\n153027", and wasn't a sound argument for subtraction, hence the syntax error. A way to resolve this is to pipe to head, e.g.
line2=$(grep -n -e "$line" $file2 | cut -f1 -d: | head -1)
to only consider the first line yielded by grep.
Upvotes: 2