Nested tags web scraping python

Question

I am scraping a fixed content from a particular website. The content lies inside a nested div as shown below:


  
    Time
        
            Full
        
  
  
    Branch
        
            IT
        
  
  
    Type
        
            Standard
        
  
  
    contact
        
            my location

I want to retrieve the only the content of strong inside the div 'overflow-hidden' inside the span with string value Branch. The code i've used is:

from bs4 import BeautifulSoup
import urllib2 
url = urllib2.urlopen("https://www.xyz.com")
content = url.read()
soup = BeautifulSoup(content)
type = soup.find('div',attrs={"class":"table-info"}).findAll('span')
print type

I've scraped all the span content inside the main div 'table-info', so that i can use conditional statement to retrieve the required content. But if i try to scrap the div content inside the span as :

type = soup.find('div',attrs={"class":"table-info"}).findAll('span').find('div')
print type

i get error as:

AttributeError: 'list' object has no attribute 'find'

Can anyone please give me some idea to retrieve content of the div in the span. Thank you. I'm using python2.7

Anish · Accepted Answer

It seems like you want to get the content from second div inside the div-"table-info". However,you are trying to get it using the tag which has no relation to what you are trying toa access.

 type = soup.find('div',attrs={"class":"table-info"}).findAll('span').find('div')

returns error as it is empty.

Better Try this:

from bs4 import BeautifulSoup
import urllib2 
url = urllib2.urlopen("https://www.xyz.com")
content = url.read()
soup = BeautifulSoup(content)
type = soup.find('div',attrs={"class":"table-info"}).findAll('div')
print type[2].find('strong').string

Nested tags web scraping python

Answers (2)

Related Questions