Reputation: 2002
This is my code to access a webpage but I need to add parameters to it: 1. First parameter is added by reading a line from file 2. Second parameter is a counter to continuously access pages
import urllib2
import json,os
f = open('codes','r')
for line in f.readlines():
id = line.strip('\n')
url = 'http://api.opencorporates.com/v0.2/companies/search?q=&jurisdiction_code=%s&per_page=26¤t_status=Active&page=%d'
i = 0
directory = id
os.makedirs(directory)
while True:
i += 5
req = urllib2.Request('%s%s%d' % (url,id, i))
print req
try:
response = urllib2.urlopen('%s%s%d' % (url, id, i))
except urllib2.HTTPError, e:
break
content = response.read()
fo = str(i) + '.json'
OUTFILE = os.path.join(directory, fo)
with open(OUTFILE, 'w') as f:
f.write(content)
This keeps creating empty directories. I know something is wrong with the url parameters. How to rectify this?
Upvotes: 0
Views: 1108
Reputation: 8615
It looks like what you want to do is to insert id
and i
into url
, but the string formatting you're using here concatenates url
, id
, and i
. Try changing this:
req = urllib2.Request('%s%s%d' % (url,id, i))
Into this:
req = urllib2.Request(url % (id, i))
Does that give you the result you want?
Also, the string formatting syntax you are using is deprecated; the currently preferred syntax is detailed in PEP 3101 -- Advanced String Formatting. So even better would be to do:
url = 'http://api.opencorporates.com/v0.2/companies/search?q=&jurisdiction_code={0}&per_page=26¤t_status=Active&page={1}'
...
req = urllib2.Request(url.format(id, i))
Instead of %s
and %d
you use curly braces ({}
) as placeholders for your parameters. Inside the curly braces, you can put a tuple index:
>>> 'I like to {0}, {0}, {0}, {1} and {2}'.format('eat', 'apples', 'bananas')
'I like to eat, eat, eat, apples and bananas'
If you just use bare curly braces, each placeholder consumes one parameter, and extras are ignored; e.g.:
>>> '{} and {} and {}'.format(1, 2, 3)
'1 and 2 and 3'
>>> '{} and {} and {}'.format(1, 2, 3, 4)
'1 and 2 and 3'
>>> '{} and {} and {}'.format(1, 2)
Traceback (most recent call last):
File "<pyshell#18>", line 1, in <module>
'{} and {} and {}'.format(1, 2)
IndexError: tuple index out of range
You can also use keyword arguments, and therefore dictionary unpacking:
>>> d = {'adj':'funky', 'noun':'cheese', 'pronoun':'him'}
>>> 'The {adj} {noun} intrigued {pronoun}.'.format(**d)
'The funky cheese intrigued him.'
There are more features, detailed in the PEP, if you're interested.
Upvotes: 2
Reputation: 1490
You need to change these bits:
'%s%s%d' % (url,id, i)
To this:
url % (id, i)
What you're doing now is creating a string like '<url><id><i>'
instead of substituting in the string.
Upvotes: 0