Reputation: 2250
I am trying to visit this URL:
http://ichart.finance.yahoo.com/table.csv?s=GOOG&a=05&b=20&c=2013&d=05&e=28&f=2013&g=d&ignore=.csv
But instead of always being GOOG it will be whatever is inputted in the variable ticker_list like so:
When I do this, it works:
URL = urllib.request.urlopen("http://ichart.finance.yahoo.com/table.csv?s=GOOG&a=05&b=20&c=2013&d=05&e=28&f=2013&g=d&ignore=.csv")
html = URL.read()
print (html)
But if I do this:
filename = input("Please enter file name to extract data from: ")
with open(filename) as f:
data = f.readlines() # Read the data from the file
tickers_list = []
for line in data:
tickers_list.append(line) # Separate tickers into individual elements in list
print (tickers_list[0]) # Check if printing correct ticker
url = "http://ichart.finance.yahoo.com/table.csv?s=%s&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv" % str(tickers_list[0])
print (url) # Check if printing correct URL
URL = urllib.request.urlopen(url)
html = URL.read()
print (html)
And gives me this error:
urllib.error.URLError: <urlopen error no host given>
Am I not doing the string formatting correctly?
Upvotes: 1
Views: 3788
Reputation: 5765
For manipulating urls in python I would suggest two solutions: furl or URLObject. Those two libraries gives you very nice interfaces to manipulate urls with ease.
Example from furl
documentation:
>>> from furl import furl >>> f = furl('http://www.google.com/?one=1&two=2') >>> f.args['three'] = '3' >>> del f.args['one'] >>> f.url 'http://www.google.com/?two=2&three=3'
Upvotes: 2
Reputation: 1121494
The data you are reading from the filename includes the newline at the end of each line (.readlines()
does not remove it). You should remove this yourself; str.strip()
removes all whitespace, including newlines:
filename = input("Please enter file name to extract data from: ")
with open(filename) as f:
tickers_list = f.readlines() # .readlines() returns a list *already*
print(tickers_list[0].strip())
url = "http://ichart.finance.yahoo.com/table.csv?s=%s&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv" % tickers_list[0].strip()
print(url)
response = urllib.request.urlopen(url)
html = response.read()
print(html)
You do not need to call str()
on the tickers_list[0]
elements, because reading from a file already results in a list of strings. Moreover, the %s
formatting placeholder converts its value to a string if it is not one yet.
With a newline (\n
character in the repr()
output below), you get the exact error you see:
>>> url = "http://ichart.finance.yahoo.com/table.csv?s=%s&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv" % 'GOOG\n'
>>> print(repr(url))
'http://ichart.finance.yahoo.com/table.csv?s=GOOG\n&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv'
>>> urllib.request.urlopen(url)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/Libraries/buildout.python/parts/opt/lib/python3.3/urllib/request.py", line 156, in urlopen
return opener.open(url, data, timeout)
File "/Users/mj/Development/Libraries/buildout.python/parts/opt/lib/python3.3/urllib/request.py", line 467, in open
req = meth(req)
File "/Users/mj/Development/Libraries/buildout.python/parts/opt/lib/python3.3/urllib/request.py", line 1172, in do_request_
raise URLError('no host given')
urllib.error.URLError: <urlopen error no host given>
If you are going to process just one line from the file input, use f.readline()
to read that one line and save yourself having to index a list. You still need to strip off the newline.
If you are going to process all lines, just loop directly over the input file, which yields each line separately, again with the newline:
with open(filename) as f:
for ticker_name in f:
ticker_name = ticker_name.strip()
url = "http://ichart.finance.yahoo.com/table.csv?s=%s&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv" % ticker_name
# etc.
Upvotes: 2