blackmamba
blackmamba

Reputation: 2002

Add url parameters in python

This is my code to access a webpage but I need to add parameters to it: 1. First parameter is added by reading a line from file 2. Second parameter is a counter to continuously access pages

import urllib2
import json,os

f = open('codes','r')
for line in f.readlines():
    id = line.strip('\n')
    url = 'http://api.opencorporates.com/v0.2/companies/search?q=&jurisdiction_code=%s&per_page=26&current_status=Active&page=%d' 
    i = 0
    directory = id
    os.makedirs(directory)
    while True:
       i += 5
       req = urllib2.Request('%s%s%d' % (url,id, i))
       print req
       try:
          response = urllib2.urlopen('%s%s%d' % (url, id, i))
       except urllib2.HTTPError, e:
          break
       content = response.read()
       fo = str(i) + '.json'    
       OUTFILE = os.path.join(directory, fo)
       with open(OUTFILE, 'w') as f:
           f.write(content)

This keeps creating empty directories. I know something is wrong with the url parameters. How to rectify this?

Upvotes: 0

Views: 1108

Answers (2)

Air
Air

Reputation: 8615

It looks like what you want to do is to insert id and i into url, but the string formatting you're using here concatenates url, id, and i. Try changing this:

req = urllib2.Request('%s%s%d' % (url,id, i))

Into this:

req = urllib2.Request(url % (id, i))

Does that give you the result you want?

Also, the string formatting syntax you are using is deprecated; the currently preferred syntax is detailed in PEP 3101 -- Advanced String Formatting. So even better would be to do:

url = 'http://api.opencorporates.com/v0.2/companies/search?q=&jurisdiction_code={0}&per_page=26&current_status=Active&page={1}'
...
req = urllib2.Request(url.format(id, i))

Instead of %s and %d you use curly braces ({}) as placeholders for your parameters. Inside the curly braces, you can put a tuple index:

>>> 'I like to {0}, {0}, {0}, {1} and {2}'.format('eat', 'apples', 'bananas')
'I like to eat, eat, eat, apples and bananas'

If you just use bare curly braces, each placeholder consumes one parameter, and extras are ignored; e.g.:

>>> '{} and {} and {}'.format(1, 2, 3)
'1 and 2 and 3'
>>> '{} and {} and {}'.format(1, 2, 3, 4)
'1 and 2 and 3'
>>> '{} and {} and {}'.format(1, 2)

Traceback (most recent call last):
  File "<pyshell#18>", line 1, in <module>
    '{} and {} and {}'.format(1, 2)
IndexError: tuple index out of range

You can also use keyword arguments, and therefore dictionary unpacking:

>>> d = {'adj':'funky', 'noun':'cheese', 'pronoun':'him'}
>>> 'The {adj} {noun} intrigued {pronoun}.'.format(**d)
'The funky cheese intrigued him.'

There are more features, detailed in the PEP, if you're interested.

Upvotes: 2

willy
willy

Reputation: 1490

You need to change these bits:

    '%s%s%d' % (url,id, i)

To this:

    url % (id, i)

What you're doing now is creating a string like '<url><id><i>' instead of substituting in the string.

Upvotes: 0

Related Questions