d3pd
d3pd

Reputation: 8315

In Bash, how can a here-document contain a variable and then be stored in a variable?

I have something like the following in a Bash script:

URL="${1}"
IFS= read -d '' code << "EOF"
import urllib2
from BeautifulSoup import BeautifulSoup
page = BeautifulSoup(urllib2.urlopen("${URL}"))
images = page.findAll('img')
for image in images:
    print(image["src"])
EOF
python <(echo "${code}")

How can the way the here-document is defined (e.g. not using read) be changed such that the variable ${URL} is resolved in the here-document and then the here-document is stored in the variable ${code}? At present, the here-document is stored in the variable successfully, but the variable in the here-document is not being resolved.

Upvotes: 2

Views: 149

Answers (2)

Charles Duffy
Charles Duffy

Reputation: 295308

I don't intend to override or replace the (entirely correct) answer to the literal question given by @anubhava -- that answer is correct, and in cases where the document being substituted into is not source code, its usage is entirely appropriate.


Substituting variables into code (whether in a heredoc or otherwise) is actually a rather dangerous practice -- you risk running into a cousin of Bobby Tables.

Much better is to send the variable out-of-band in such a way as to prevent any possibility of parsing as code. In awk, this is done with -vkey=val; for Python, one easy way is to use the environment:

export URL="${1}"
IFS= read -d '' code << "EOF"
import urllib2, os
from BeautifulSoup import BeautifulSoup
page = BeautifulSoup(urllib2.urlopen(os.environ['URL']))
images = page.findAll('img')
for image in images:
    print(image["src"])
EOF
python <(echo "${code}")

The changes from your original code:

  • The use of export when assigning URL
  • The import os in the Python
  • The reference to os.environ['URL'] in the Python.

As for why this approach is preferable -- consider what would happen if you were processing a URL given containing the string "+__import__('shutil').rmtree('/')+". Running

page = BeautifulSoup(urllib2.urlopen(""+__import__('shutil').rmtree('/')+""))

...is probably not going to have the effect you intend.

Upvotes: 0

anubhava
anubhava

Reputation: 784998

Remove quote from EOF:

URL="${1}"
IFS= read -d '' code <<EOF
import urllib2
from BeautifulSoup import BeautifulSoup
page = BeautifulSoup(urllib2.urlopen("${URL}"))
images = page.findAll('img')
for image in images:
    print(image["src"])
EOF
python <(echo "${code}")

As per man bash:

If any characters in word are quoted, the delimiter is the result of quote removal on word, and the lines in the here-document are not expanded.

Upvotes: 4

Related Questions