Reputation: 488
I am trying simple web scraping using python, but there is problem fetching link names as there are 2 to 3 href
headers in the same class btn
as mentioned below whereas i need only the first one to be printed for every new occurrence in the loop.
#!/usr/bin/python3
from bs4 import BeautifulSoup
import requests
url = "https://www.learndatasci.com/free-data-science-books/"
# Getting the webpage, creating a Response object.
response = requests.get(url)
# Extracting the source code of the page.
data = response.text
# Passing the source code to BeautifulSoup to create a BeautifulSoup object for it.
soup = BeautifulSoup(data, 'lxml')
# Extracting all the <a> tags into a list.
tags = soup.find_all('a', class_='btn')
# Extracting URLs from the attribute href in the <a> tags.
for tag in tags:
print(tag.get('href'))
Output from the above code:
http://www.cin.ufpe.br/~tfl2/artificial-intelligence-modern-approach.9780131038059.25368.pdf
http://www.amazon.com/gp/product/0136042597/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0136042597&linkCode=as2&tag=learnds-20&linkId=3FRORB7P56CEWSK5
http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf
http://amzn.to/1WePh0N
http://www.e-booksdirectory.com/details.php?ebook=9575
http://amzn.to/1FcalRp
While desired Output:
http://www.cin.ufpe.br/~tfl2/artificial-intelligence-modern-approach.9780131038059.25368.pdf
http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf
http://www.e-booksdirectory.com/details.php?ebook=9575
Upvotes: 1
Views: 186
Reputation: 1125398
BeautifulSoup has excellent CSS support, just use that to pick every odd item:
soup = BeautifulSoup(data, 'lxml')
for tag in soup.select('a.btn:nth-of-type(odd)'):
Demo:
>>> for tag in soup.select('a.btn:nth-of-type(odd)'): print(tag['href'])
...
http://www.cin.ufpe.br/~tfl2/artificial-intelligence-modern-approach.9780131038059.25368.pdf
http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf
http://www.e-booksdirectory.com/details.php?ebook=9575
... etc
You do have a parent <div class="book">
element per group of links you could make use of:
for tag in soup.select('.book a.btn:first-of-type'):
which would work for any number of links per book.
Upvotes: 2