Dastan Taibosunov
Dastan Taibosunov

Reputation: 55

How do I find second div class with the same name?

I don't even know how to properly ask this. I just started with python, and I'm trying to make a crawler. Everything works fine but I can't "call" or "find" the second div with identical class names in the body. I've been searching internet for help but the way people write their code is not similar to what I wrote. so the HTML looks something like this:

<div class="card">
    <div class="card-body">...</div>
    <div class="card-body">...</div>

My code:

comp_link = comp_card.find('a', class_ = 'link')
href_link = comp_link['href']
link_final = 'https://www.someweb.com' + href_link
prof_text = requests.get(link_final).text
prof_soup = BeautifulSoup(prof_text, 'lxml')
comp_name = prof_soup.find('h2', class_ = 'company-name').text.strip()
comp_info = prof_soup.find('div', class_ ='col-md-12 col-lg-4')

but when I try to use

comp_info = comp_info.find('div', class_ = 'card-body'[1])

it doesn't work. I've tried to experiment, use other peoples solutions from StackOverflow (but I'm too dumb).

Upvotes: 0

Views: 878

Answers (1)

facelessuser
facelessuser

Reputation: 1734

Often, I prefer using CSS selectors. In this simple case you could select the second child that has the class name card-body. You can use the nth-child selector to grab the second div:

import bs4

html = """
<div class="card">
    <div class="card-body">Not this</div>
    <div class="card-body">But this</div>
</div>
"""
soup = bs4.BeautifulSoup(html)
print(soup.select('div.card-body:nth-child(2)'))

Output

[<div class="card-body">But this</div>]

If you happen to be in a situation where the targetted element is not actually the second element, but simply the second element with the class card-body, it may be advantagous to use nth-child(n of selector). This will select the second one element that matches the specified selector:

html = """
<div class="card">
    <div class="other-class">Not this</div>
    <div class="card-body">Or this</div>
    <div class="card-body">But this</div>
</div>
"""
soup = bs4.BeautifulSoup(html)
print(soup.select('div:nth-child(2 of .card-body)'))

Output

[<div class="card-body">But this</div>]

BeautifulSoup's CSS selector logic is driven by the SoupSieve library, and more information can be found here: https://facelessuser.github.io/soupsieve/selectors/pseudo-classes/#:nth-child.

Upvotes: 0

Related Questions