Reputation: 782
I am scraping a fixed content from a particular website. The content lies inside a nested div as shown below:
<div class="table-info">
<div>
<span>Time</span>
<div class="overflow-hidden">
<strong>Full</strong>
</div>
</div>
<div>
<span>Branch</span>
<div class="overflow-hidden">
<strong>IT</strong>
</div>
</div>
<div>
<span>Type</span>
<div class="overflow-hidden">
<strong>Standard</strong>
</div>
</div>
<div>
<span>contact</span>
<div class="overflow-hidden">
<strong>my location</strong>
</div>
</div>
</div>
I want to retrieve the only the content of strong inside the div 'overflow-hidden' inside the span with string value Branch. The code i've used is:
from bs4 import BeautifulSoup
import urllib2
url = urllib2.urlopen("https://www.xyz.com")
content = url.read()
soup = BeautifulSoup(content)
type = soup.find('div',attrs={"class":"table-info"}).findAll('span')
print type
I've scraped all the span content inside the main div 'table-info', so that i can use conditional statement to retrieve the required content. But if i try to scrap the div content inside the span as :
type = soup.find('div',attrs={"class":"table-info"}).findAll('span').find('div')
print type
i get error as:
AttributeError: 'list' object has no attribute 'find'
Can anyone please give me some idea to retrieve content of the div in the span. Thank you. I'm using python2.7
Upvotes: 2
Views: 8101
Reputation: 1960
It seems like you want to get the content from second div inside the div-"table-info". However,you are trying to get it using the tag which has no relation to what you are trying toa access.
type = soup.find('div',attrs={"class":"table-info"}).findAll('span').find('div')
returns error as it is empty.
Better Try this:
from bs4 import BeautifulSoup
import urllib2
url = urllib2.urlopen("https://www.xyz.com")
content = url.read()
soup = BeautifulSoup(content)
type = soup.find('div',attrs={"class":"table-info"}).findAll('div')
print type[2].find('strong').string
Upvotes: 1
Reputation: 12092
The findAll
returns a list of BS elements, and find
is defined on a BS object, not a list of BS objects, hence the error. Your initial part of the code is fine,
Do this instead:
from bs4 import BeautifulSoup
import urllib2
url = urllib2.urlopen("https://www.xyz.com")
content = url.read()
soup = BeautifulSoup(content)
table = soup.find('div',attrs={"class":"table-info"})
spans = table.findAll('span')
branch_span = span[1]
# Do you manipulation with the branch_span
OR
from bs4 import BeautifulSoup
import urllib2
url = urllib2.urlopen("https://www.xyz.com")
content = url.read()
soup = BeautifulSoup(content)
table = soup.find('div',attrs={"class":"table-info"})
spans = table.findAll('span')
for span in spans:
if span.text.lower() == 'branch':
# Do your manipulation
Upvotes: 0