TheRavenKing
TheRavenKing

Reputation: 45

Why does my bash script flag this awk substring command as a syntactic error when it works in the terminal?

I'm trying to extract a list of dates from a series of links using lynx's dump function and piping the output through grep and awk. This operation works successfully in the terminal and outputs dates accurately. However, when it is placed into a shell script, bash claims a syntax error:

Scripts/ETC/PreD.sh: line 18: syntax error near unexpected token `('
Scripts/ETC/PreD.sh: line 18: ` lynx --dump "$link" | grep -m 1 Date | awk '{print substr($0,10)}' >> dates.txt'

For context, this is part of a while-read loop in which $link is being read from a file. Operations undertaken inside this while-loop when the awk command is removed are all successful, as are similar while-loops that include other awk commands.

I know that either I'm misunderstanding how bash handles variable substitution, or how bash handles awk commands, or some combination of the two. Any help would be immensely appreciated.

EDIT: Shellcheck is divided on this, the website version finds no error, but my downloaded version provides error SC1083, which says:

This { is literal. Check expression (missing ;/\n?) or quote it.

A check on the Shellcheck GitHub page provides this:

This error is harmless when the curly brackets are supposed to be literal, in e.g. awk {'print $1'}. 
However, it's cleaner and less error prone to simply include them inside the quotes: awk '{print $1}'.

Script follows:

#!/bin/bash

while read -u 4 link
do
        IFS=/ read a b c d e <<< "$link"
        echo "$e" >> 1.txt
        lynx --dump "$link" | grep -A 1 -e With: | tr -d [:cntrl:][:digit:][] | sed 's/\With//g' | awk '{print substr($0,10)}' | sed 's/\(.*\),/\1'\ and'/' | tr -s ' ' >> 2.txt
        lynx --dump "$link" | grep -m 1 Date | awk '{print substr($0,10)}' >> dates.txt
done 4< links.txt

Upvotes: 0

Views: 757

Answers (1)

Dudi Boy
Dudi Boy

Reputation: 4890

  1. In sed command you have unmatched ', due to unquoted '.

  2. In awk script your have constant zero length variable.

From gawk manual:

substr(string, start [, length ])

Return a length-character-long substring of string, starting at character number start. The first character of a string is character number one.48 For example, substr("washington", 5, 3) returns "ing".

If length is not present, substr() returns the whole suffix of string that begins at character number start. For example, substr("washington", 5) returns "ington". The whole suffix is also returned if length is greater than the number of characters remaining in the string, counting from character start.

If start is less than one, substr() treats it as if it was one. (POSIX doesn’t specify what to do in this case: BWK awk acts this way, and therefore gawk does too.) If start is greater than the number of characters in the string, substr() returns the null string. Similarly, if length is present but less than or equal to zero, the null string is returned.

Also I suggest you combine grep|awk|sed|tr into single awk script. And debug the awk script with printouts.

From:

lynx --dump "$link" | grep -A 1 -e With: | tr -d [:cntrl:][:digit:][] | sed 's/\With//g' | awk '{print substr($0,10,length)}' | sed 's/\(.*\),/\1'\ and'/' | tr -s ' ' >> 2.txt

To:

lynx --dump "$link" | awk '/With/{found=1;next}found{found=0;print sub(/\(.*\),/,"& and",gsub(/ +/," ",substr($0,10)))}' >> 2.txt

From:

lynx --dump "$link" | grep -m 1 Date | awk '{print substr($0,10,length)}' >> dates.txt

To:

lynx --dump "$link" | awk '/Date/{print substr($0,10)}' >> dates.txt

Upvotes: 2

Related Questions