hanuta98
hanuta98

Reputation: 89

BeautifulSoup doesnt find all tags from parsed html?

HTML Code from my original file, line 91 contains tags i would like to find:

<section class="lectsect" id="somesection">
    <h2><a href="#somesection">Some Title</a></h2>
    <div class="row">
        <div class="col-md-7">
            <div class="lectures-thumb">
                <div class="lect">
                    <div class="padbox">
                        <div class="row">
                            <div class="col-md-3">
                                <img src="images/contact-image.jpg" height=115 width=115>
                            </div>
                            <div class="col-md-9">
                                <h3><a href="ieditedthesesnippetsseparatelyfuckme.edu">Blargh</a></h3>
                                <a class="lecturer" id="anderson" href="somepage">FindMe</a>,     <a href="noinfo">Blorgl</a>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
</section>

as returned by the find method of the parsed html document:

In [30]: parsed.find(id="somesection")                                             
Out[30]: 
<section class="lectsect" id="somesection">
    <h2><a href="#somesection">Some Section Title</a></h2>
    <div class="row">
        <div class="col-md-7">
            <div class="lectures-thumb">
                <div class="lect">
                    <div class="padbox">
                        <div class="row">
                            <div class="col-md-3">
                                <img height="115" src="images/contact-image.jpg" width="115"/>
                            </div>
                            <div class="col-md-9">
                                <h3><a href="blablo#">Anonymized<a></h3>
                                <a href="blabla">FindMe</a>, <a href="noinfohere">Whatever<a>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
</section>

As you can see, the parsed code does not contain the class and id tags from line 91 anymore. Correspondingly the following returns an empty list:

In [29]: parsed.findAll("a", {"class": "lecturer"})                                
Out[29]: []

How do I find the content of this a element by class or id?

Upvotes: 1

Views: 49

Answers (1)

QHarr
QHarr

Reputation: 84455

Use the relationships between elements still present. For example,

soup = bs(html, 'lxml')
print(soup.select_one('#somesection h3 + a').text)

Upvotes: 1

Related Questions