Reputation: 89
HTML Code from my original file, line 91 contains tags i would like to find:
<section class="lectsect" id="somesection">
<h2><a href="#somesection">Some Title</a></h2>
<div class="row">
<div class="col-md-7">
<div class="lectures-thumb">
<div class="lect">
<div class="padbox">
<div class="row">
<div class="col-md-3">
<img src="images/contact-image.jpg" height=115 width=115>
</div>
<div class="col-md-9">
<h3><a href="ieditedthesesnippetsseparatelyfuckme.edu">Blargh</a></h3>
<a class="lecturer" id="anderson" href="somepage">FindMe</a>, <a href="noinfo">Blorgl</a>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
as returned by the find method of the parsed html document:
In [30]: parsed.find(id="somesection")
Out[30]:
<section class="lectsect" id="somesection">
<h2><a href="#somesection">Some Section Title</a></h2>
<div class="row">
<div class="col-md-7">
<div class="lectures-thumb">
<div class="lect">
<div class="padbox">
<div class="row">
<div class="col-md-3">
<img height="115" src="images/contact-image.jpg" width="115"/>
</div>
<div class="col-md-9">
<h3><a href="blablo#">Anonymized<a></h3>
<a href="blabla">FindMe</a>, <a href="noinfohere">Whatever<a>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
As you can see, the parsed code does not contain the class and id tags from line 91 anymore. Correspondingly the following returns an empty list:
In [29]: parsed.findAll("a", {"class": "lecturer"})
Out[29]: []
How do I find the content of this a element by class or id?
Upvotes: 1
Views: 49
Reputation: 84455
Use the relationships between elements still present. For example,
soup = bs(html, 'lxml')
print(soup.select_one('#somesection h3 + a').text)
Upvotes: 1