wan mohd payed
wan mohd payed

Reputation: 171

How to filter and take only one download link?

I have this code:

import urllib
from bs4 import BeautifulSoup

url = "http://www.microsoft.com/en-us/download/confirmation.aspx?id=17851"
pageurl = urllib.urlopen(url)
soup = BeautifulSoup(pageurl)

for d in soup.select("p.start-download [href]"):
        print d['href']

When I run this code,it give me many download link. How can I only take only one of the download link given?

Upvotes: 0

Views: 96

Answers (2)

Nafiul Islam
Nafiul Islam

Reputation: 82450

If you use your given code, you will not be able to take hold of the links and use them. Use the following code instead:

import urllib
from bs4 import BeautifulSoup

url = "http://www.microsoft.com/en-us/download/confirmation.aspx?id=17851"
pageurl = urllib.urlopen(url)
soup = BeautifulSoup(pageurl)

urls = []
for d in soup.select("p.start-download [href]"):
    urls.append(d.attrs['href'])

print urls[0]

If you use the above code, then you can use the links themselves in other parts of the program. You may also do this using a lit comprehension:

urls = [d['href'] for d in soup.select("p.start-download [href]")]

print urls[0]

You can then iterate over urls to get the url you want, or just use an index to get your link. Either way, this is more flexible than just printing a link. For example if you did not want to full installation, and just wanted some other package or perhaps the package for XP instead of Vista, 7 and 8 (using your urls here as an example).

Upvotes: 2

yemu
yemu

Reputation: 28259

for d in soup.select("p.start-download [href]"):
        print d['href']
        break

will stop after the first link

Upvotes: 1

Related Questions