How to add variable to URL parameter in urllib

Question

I am trying to visit this URL:

http://ichart.finance.yahoo.com/table.csv?s=GOOG&a=05&b=20&c=2013&d=05&e=28&f=2013&g=d&ignore=.csv

But instead of always being GOOG it will be whatever is inputted in the variable ticker_list like so:

When I do this, it works:

URL = urllib.request.urlopen("http://ichart.finance.yahoo.com/table.csv?s=GOOG&a=05&b=20&c=2013&d=05&e=28&f=2013&g=d&ignore=.csv")
html = URL.read()
print (html)

But if I do this:

filename = input("Please enter file name to extract data from: ")
with open(filename) as f:
    data = f.readlines()    # Read the data from the file

tickers_list = []
for line in data:
    tickers_list.append(line)   # Separate tickers into individual elements in list

print (tickers_list[0]) # Check if printing correct ticker
url = "http://ichart.finance.yahoo.com/table.csv?s=%s&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv" % str(tickers_list[0])
print (url) # Check if printing correct URL

URL = urllib.request.urlopen(url)
html = URL.read()
print (html)

And gives me this error:

urllib.error.URLError:

Am I not doing the string formatting correctly?

Martijn Pieters · Accepted Answer

The data you are reading from the filename includes the newline at the end of each line (.readlines() does not remove it). You should remove this yourself; str.strip() removes all whitespace, including newlines:

filename = input("Please enter file name to extract data from: ")
with open(filename) as f:
    tickers_list = f.readlines()    # .readlines() returns a list *already*

print(tickers_list[0].strip())
url = "http://ichart.finance.yahoo.com/table.csv?s=%s&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv" % tickers_list[0].strip()
print(url)

response = urllib.request.urlopen(url)
html = response.read()
print(html)

You do not need to call str() on the tickers_list[0] elements, because reading from a file already results in a list of strings. Moreover, the %s formatting placeholder converts its value to a string if it is not one yet.

With a newline ( character in the repr() output below), you get the exact error you see:

>>> url = "http://ichart.finance.yahoo.com/table.csv?s=%s&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv" % 'GOOG
'
>>> print(repr(url))
'http://ichart.finance.yahoo.com/table.csv?s=GOOG
&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv'
>>> urllib.request.urlopen(url)
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/mj/Development/Libraries/buildout.python/parts/opt/lib/python3.3/urllib/request.py", line 156, in urlopen
    return opener.open(url, data, timeout)
  File "/Users/mj/Development/Libraries/buildout.python/parts/opt/lib/python3.3/urllib/request.py", line 467, in open
    req = meth(req)
  File "/Users/mj/Development/Libraries/buildout.python/parts/opt/lib/python3.3/urllib/request.py", line 1172, in do_request_
    raise URLError('no host given')
urllib.error.URLError:

If you are going to process just one line from the file input, use f.readline() to read that one line and save yourself having to index a list. You still need to strip off the newline.

If you are going to process all lines, just loop directly over the input file, which yields each line separately, again with the newline:

with open(filename) as f:
    for ticker_name in f:
        ticker_name = ticker_name.strip()
        url = "http://ichart.finance.yahoo.com/table.csv?s=%s&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv" % ticker_name

        # etc.

How to add variable to URL parameter in urllib

Answers (2)

Related Questions