Reputation: 55
I don't even know how to properly ask this. I just started with python, and I'm trying to make a crawler. Everything works fine but I can't "call" or "find" the second div with identical class names in the body. I've been searching internet for help but the way people write their code is not similar to what I wrote. so the HTML looks something like this:
<div class="card">
<div class="card-body">...</div>
<div class="card-body">...</div>
My code:
comp_link = comp_card.find('a', class_ = 'link')
href_link = comp_link['href']
link_final = 'https://www.someweb.com' + href_link
prof_text = requests.get(link_final).text
prof_soup = BeautifulSoup(prof_text, 'lxml')
comp_name = prof_soup.find('h2', class_ = 'company-name').text.strip()
comp_info = prof_soup.find('div', class_ ='col-md-12 col-lg-4')
but when I try to use
comp_info = comp_info.find('div', class_ = 'card-body'[1])
it doesn't work. I've tried to experiment, use other peoples solutions from StackOverflow (but I'm too dumb).
Upvotes: 0
Views: 878
Reputation: 1734
Often, I prefer using CSS selectors. In this simple case you could select the second child that has the class name card-body
. You can use the nth-child
selector to grab the second div:
import bs4
html = """
<div class="card">
<div class="card-body">Not this</div>
<div class="card-body">But this</div>
</div>
"""
soup = bs4.BeautifulSoup(html)
print(soup.select('div.card-body:nth-child(2)'))
Output
[<div class="card-body">But this</div>]
If you happen to be in a situation where the targetted element is not actually the second element, but simply the second element with the class card-body
, it may be advantagous to use nth-child(n of selector)
. This will select the second one element that matches the specified selector:
html = """
<div class="card">
<div class="other-class">Not this</div>
<div class="card-body">Or this</div>
<div class="card-body">But this</div>
</div>
"""
soup = bs4.BeautifulSoup(html)
print(soup.select('div:nth-child(2 of .card-body)'))
Output
[<div class="card-body">But this</div>]
BeautifulSoup's CSS selector logic is driven by the SoupSieve library, and more information can be found here: https://facelessuser.github.io/soupsieve/selectors/pseudo-classes/#:nth-child.
Upvotes: 0