Goose
Goose

Reputation: 2250

How to add variable to URL parameter in urllib

I am trying to visit this URL:

http://ichart.finance.yahoo.com/table.csv?s=GOOG&a=05&b=20&c=2013&d=05&e=28&f=2013&g=d&ignore=.csv

But instead of always being GOOG it will be whatever is inputted in the variable ticker_list like so:

When I do this, it works:

URL = urllib.request.urlopen("http://ichart.finance.yahoo.com/table.csv?s=GOOG&a=05&b=20&c=2013&d=05&e=28&f=2013&g=d&ignore=.csv")
html = URL.read()
print (html)

But if I do this:

filename = input("Please enter file name to extract data from: ")
with open(filename) as f:
    data = f.readlines()    # Read the data from the file

tickers_list = []
for line in data:
    tickers_list.append(line)   # Separate tickers into individual elements in list

print (tickers_list[0]) # Check if printing correct ticker
url = "http://ichart.finance.yahoo.com/table.csv?s=%s&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv" % str(tickers_list[0])
print (url) # Check if printing correct URL

URL = urllib.request.urlopen(url)
html = URL.read()
print (html)

And gives me this error:

urllib.error.URLError: <urlopen error no host given>

Am I not doing the string formatting correctly?

Upvotes: 1

Views: 3788

Answers (2)

spinus
spinus

Reputation: 5765

For manipulating urls in python I would suggest two solutions: furl or URLObject. Those two libraries gives you very nice interfaces to manipulate urls with ease.

Example from furl documentation:

>>> from furl import furl
>>> f = furl('http://www.google.com/?one=1&two=2')
>>> f.args['three'] = '3'
>>> del f.args['one']
>>> f.url
'http://www.google.com/?two=2&three=3'

Upvotes: 2

Martijn Pieters
Martijn Pieters

Reputation: 1121494

The data you are reading from the filename includes the newline at the end of each line (.readlines() does not remove it). You should remove this yourself; str.strip() removes all whitespace, including newlines:

filename = input("Please enter file name to extract data from: ")
with open(filename) as f:
    tickers_list = f.readlines()    # .readlines() returns a list *already*

print(tickers_list[0].strip())
url = "http://ichart.finance.yahoo.com/table.csv?s=%s&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv" % tickers_list[0].strip()
print(url)

response = urllib.request.urlopen(url)
html = response.read()
print(html)

You do not need to call str() on the tickers_list[0] elements, because reading from a file already results in a list of strings. Moreover, the %s formatting placeholder converts its value to a string if it is not one yet.

With a newline (\n character in the repr() output below), you get the exact error you see:

>>> url = "http://ichart.finance.yahoo.com/table.csv?s=%s&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv" % 'GOOG\n'
>>> print(repr(url))
'http://ichart.finance.yahoo.com/table.csv?s=GOOG\n&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv'
>>> urllib.request.urlopen(url)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mj/Development/Libraries/buildout.python/parts/opt/lib/python3.3/urllib/request.py", line 156, in urlopen
    return opener.open(url, data, timeout)
  File "/Users/mj/Development/Libraries/buildout.python/parts/opt/lib/python3.3/urllib/request.py", line 467, in open
    req = meth(req)
  File "/Users/mj/Development/Libraries/buildout.python/parts/opt/lib/python3.3/urllib/request.py", line 1172, in do_request_
    raise URLError('no host given')
urllib.error.URLError: <urlopen error no host given>

If you are going to process just one line from the file input, use f.readline() to read that one line and save yourself having to index a list. You still need to strip off the newline.

If you are going to process all lines, just loop directly over the input file, which yields each line separately, again with the newline:

with open(filename) as f:
    for ticker_name in f:
        ticker_name = ticker_name.strip()
        url = "http://ichart.finance.yahoo.com/table.csv?s=%s&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv" % ticker_name

        # etc.

Upvotes: 2

Related Questions