Reputation: 1418
Ok so i have the following script to scrape contact details from a list of urls (urls.txt). When i run the following command direct from the terminal i get the correct result
perl saxon-lint.pl --html --xpath 'string-join(//div[2]/div[2]/div[1]/div[2]/div[2])' http://url.com
however when i call the above command from within a script i get a "no such file or directory" result
Here is a copy of my script
#!/bin/bash
while read inputline
do
//Read the url from urls.txt
url="$(echo $inputline)"
//execute saxon-lint to grab the contents of the XPATH from the url within urls.txt
mydata=$("perl saxon-lint.pl --html --xpath 'string-join(//div[2]/div[2]/div[1]/div[2]/div[2])' $url ")
//output the result in myfile.csv
echo "$url,$mydata" >> myfile.csv
//wait 4 seconds
sleep 4
//move to the next url
done <urls.txt
i have tried changing the perl to ./ but get the same result
can anyone advise where i am going wrong with this please
The error that i am receiving is
./script2.pl: line 6: ./saxon-lint.pl --html --xpath 'string-join(//div[2]/div[2]/div[1]/div[2]/div[2])' http://find.icaew.com/listings/view/listing_id/20669/avonhurst-chartered-accountants : No such file or directory
Thanks in advance
Upvotes: 0
Views: 5251
Reputation: 95242
You should accept @glennjackman's answer, as that is exactly the problem. This line:
mydata=$("perl saxon-lint.pl --html --xpath 'string-join(//div[2]/div[2]/div[1]/div[2]/div[2])' $url ")
is telling the shell to run this command:
"perl saxon-lint.pl --html --xpath 'string-join(//div[2]/div[2]/div[1]/div[2]/div[2])' $url "
... including the double quotes. If you type that with the double quotes at the shell prompt, you'll get the same "No such file or directory" error message that you're getting from your script.
A couple other notes on the script:
url="$(echo $inputline)"
This is a roundabout way of making a second variable into a copy of the first. A simple url=$intputline
would work as well, but you could also just use read url
in the first place. Not sure why you need two variables.
//output the result in myfile.csv
echo "$url,$mydata" >> myfile.csv
Be aware that when passing a variable containing user-supplied input as the first argument to echo
, you create the possibility of unexpected behavior. In this case, it's a low possibility, since a URL isn't likely to start with a -
character, but it's good to get out of the habit; I would use printf
. Also, instead of appending each line inside the loop, I would just redirect the output of the loop along with the input:
printf '%s,%s\n' "$url" "$mydata"
[...]
done <urls.txt >>myfile.csv
If you don't expect myfile.csv
to exist or have anything you need to keep at the top of the loop, you can change that to a single >
and avoid the possibility of messy mixtures of output from different runs.
Upvotes: 1
Reputation: 246744
Don't put double quotes inside the command substitution.
Not:
mydata=$("perl saxon-lint.pl --html --xpath 'string-join(//div[2]/div[2]/div[1]/div[2]/div[2])' $url ")
# .......^...........................................................................................^
But this:
mydata=$(perl saxon-lint.pl --html --xpath 'string-join(//div[2]/div[2]/div[1]/div[2]/div[2])' $url )
With the double quotes, you're instructing bash to look for a program named "perl saxon-lint.pl --html etc etc" in the path, spaces and all, and clearly no such program exists.
Upvotes: 6