Reputation: 65
New to bash scripting. I'm getting pretty familiar with shell scripting pretty well. I wrote this text transform script for a feed for a client. And extracts the url's I want, and the titles of articles. Awesome.
echo $(var=$(curl -L website.com/news)) |
grep -Po '<h3 class="article-link"><a href="\K[^<]+' <<< $var |
result=$(sed 's/"/\n/g' | sed 's/ \//\n\//g' | sed 's/>//g') ; let this=0 ; echo "$result" | while read line ; do if ((this % 2 == 0 )) ; then echo website.com/news$line ; else echo $line ; fi ; let this+=1 ; done
When I try to extract it to a file and run it with bash OR sh myThing.sh, it doesn't work at all. The only thing that echo's is 'webiste.com/news', when I try to echo $this, all I get is 1. What am I doing wrong?
#!/bin/bash
echo $(var=$(curl -L website.com/news)) |
grep -Po '<h3 class="article-link"><a href="\K[^<]+' <<< $var |
result=$(sed 's/"/\n/g' | sed 's/ \//\n\//g' | sed 's/>//g')
let this=0
echo "$result" | while read line
do
if ((this % 2 == 0 ))
then
echo website.com/news$line
else
echo $line
fi
let this+=1
done
edit:
#!/bin/bash
var=$(curl -L linux.com/news)
select=$(grep -Po '<h3 class="article-list__title"><a href="\K[^<]+' <<< $var)
result=$(sed 's/"/\n/g' | sed 's/ \//\n\//g' | sed 's/>//g')
let this=0
echo "$result" | while read line
do
if ((this % 2 == 0 ))
then
echo website.com/news$line
else
echo $line
fi
let this+=1
done
Upvotes: 0
Views: 1288
Reputation: 6758
This line is totally wrong. You are attempting to pass thru pipes the standard output of each process when none of them ever prints anything except standard error.
echo $(var=$(curl -L website.com/news)) | grep -Po '<h3 class="article-link"><a href="\K[^<]+' <<< $var | result=$(sed 's/"/\n/g' | sed 's/ \//\n\//g' | sed 's/>//g')
I'll break down what I believe you are attempting to do.
echo $(var=$(curl -: website.com/news))
The above code will only print the standard error, which is a separate stream than standard output. The standard output is assigned to $var
. However you are attempting to pass the standard output to the next process which is nothing but a newline at this time.
grep -Po '<h3 class="article-link"><a href="\K[^<]+' <<< $var
The here-string <<<
takes precedence over pipe. But variable $var
is lost as it was defined inside a sub-shell and not in the parent shell. Thanks to @mklement0.
The proper way to accomplish all this is to not use $var
. All you wanted is the value stored in $result
.
result=$(curl -L website.com/news | grep -Po '<h3 class="article-link"><a href="\K[^<]+'| sed 's/"/\n/g' | sed 's/ \//\n\//g' | sed 's/>//g')
I don't intend to optimize your script. This is more of a suggested solution. A more comprehensive answer to your question Why is my shell command working at the prompt, but not as a bash script? is answered by mklement0 here.
Upvotes: 1
Reputation: 437823
This answer solves the OP's specific problem, but to address the question "Why is my shell command working at the prompt, but not as a bash script?" generally, Etan Reisner provides an excellent answer in the comments:
"You are either not running that exact command or it "works" because you have shell state that is affecting things in ways you take to be "working" and your script doesn't have that state. Try launching an entirely new shell session and see if that command, on its own, works for you there."
echo $(var=...)
will assign a value to variable $var
, but will not output anything, so the echo
command will simply print a newline.
Furthermore, because the assignment to $var
happens inside $(...)
(a command substitution), it is confined to the subshell that the command inside the substitution ran in, so $var
will not be defined in the calling shell.
(A subshell is a child process that contains a duplicate of the current shell's environment, without being able to modify the current shell's environment).
More generally, you cannot meaningfully define variables inside a pipeline - they will neither be visible to other pipeline segments, nor after the pipeline finishes.[1]
The only reason your [original] command could ever have worked is if $var
had a preexisting value in your shell.
In fact, given that you provide input to grep
via a here-string (<<<
), the first segment of your pipeline (echo ...
) is entirely ignored.
To pass the output of curl
through the pipeline to grep
and then to sed
, no intermediate variables are needed at all.
Furthermore, your sed
command is lacking input: you probably meant to feed it $var
in your first attempt, and $select
in the 2nd (your 2nd attempt came close to a correct solution).
What you were probably ultimately looking for:
result=$(curl -L website.com/news |
grep -Po '<h3 class="article-link"><a href="\K[^<]+' |
sed 's/"/\n/g' | sed 's/ \//\n\//g' | sed 's/>//g')
# ... processing of "$result"
Some additional notes:
sed
calls into a single one.while
loop, without the need for intermediate variable $result
."$line"
instead of $line
to protect them from interpretation by the shell (word-splitting, globbing).let this+=1
is better expressed as (( ++this ))
in modern Bash.bash
.[1] All commands involved in a pipeline by default run in a subshell in bash
, so they all see copies of the parent shell's variables. Bash 4.2+ offers the lastpipe
option (off by default) to allow you to create variables in the current shell instead of in a subshell, by running the last pipeline segment (only) in the current shell instead of in a subshell, to facilitate scenarios such as ... | while read -r line ...
and have $line
continue to exist after the pipeline finishes.
Note that this still doesn't enable defining a variable in an earlier pipeline segment in the hopes that a later segment will see it - this can never work, because the commands that make up a pipeline are launched at the same time, and it is only through coordination of the input and output streams that effective left-to-right processing happens.
Upvotes: 3