APPGIS
APPGIS

Reputation: 353

Python scrape specific tag without class name

I'm developing a python script to scrape data from a specific site. I'm using Beautiful Soap as python module. The interesting data into HTML page are into this structure:

<tbody aria-live="polite" aria-relevant="all">
  <tr  style="">
   <td>
      <a href="www.server.com/art/crag">Name<a>
   </td>
   <td class="nowrap"></td>
   <td class="hidden-xs"></td>
  </tr>
</tbody>

into tag tbody there are more tr tag and I would like take to each only first tag a of tag td

I have tried in this way:

page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
a = soup.find(id='tabella_falist')
b = a.find("tbody")
link = [p.attrs['href'] for p in b.select("a")]

but in this way the script take all href into all td tag. How can take only first?

Thanks

Upvotes: 0

Views: 1861

Answers (2)

Arount
Arount

Reputation: 10403

This should do the work

html = '''<html><body><tbody aria-live="polite" aria-relevant="all">
  <tr  style="">
   <td>
      <a href="www.server.com/art/crag">GOOD ONE<a>
      <a href="www.server.com/art/crag">NOT GOOD ONE<a>
   </td>
   <td class="nowrap">
      <a href="#">GOOD ONE</a>
   </td>
   <td class="hidden-xs"></td>
  </tr>
</tbody></body></html>'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)

for td in soup.select('td'):
    a = td.find('a')
    if a is not None:
        print a.attrs['href']

Upvotes: 0

Nurjan
Nurjan

Reputation: 6053

If I understood correctly you can try this:

from bs4 import BeautifulSoup
import requests

url = 'your_url'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')

print(soup.a)

soup.a will return the first a tag on the page.

Upvotes: 1

Related Questions