big11mac
big11mac

Reputation: 189

Accessing nested elements with beautifulsoup

I have the following html:

<div id="contentDiv">
    <!-- START FILER DIV -->
    <div style="margin: 15px 0 10px 0; padding: 3px; overflow: hidden; background-color: #BCD6F8;">
    <div class="mailer">Mailing Address
        <span class="mailerAddress">500 ORACLE PARKWAY</span>
        <span class="mailerAddress">MAIL STOP 5 OP 7</span>
        <span class="mailerAddress">REDWOOD CITY CA 94065</span>
     </div>

I am trying to access "500 ORACLE PARKWAY" and "MAIL STOP 5 OP &", but I cannot find a way to do it. My attempt was this:

for item in soup.findAll("span", {"class" : "mailerAddress"}):
    if item.parent.name == 'div':
        return_list.append(item.contents)

Edit: I forgot to mention that there are elements after that in the html that use similar tags so it captures all of those when I just want the first two.

Edit: link: https://www.sec.gov/cgi-bin/browse-edgar?CIK=orcl

Upvotes: 1

Views: 3563

Answers (2)

SIM
SIM

Reputation: 22440

Try this:

from bs4 import BeautifulSoup
import requests

res = requests.get("https://www.sec.gov/cgi-bin/browse-edgar?CIK=orcl").text
soup = BeautifulSoup(res,'lxml')
for item in soup.find_all(class_="mailerAddress")[:2]:
    print(item.text)

Result:

500 ORACLE PARKWAY
MAIL STOP 5 OP 7

Upvotes: 1

E_K
E_K

Reputation: 104

I'm going to attempt to answer this with the little bit of information we have. If you just want the first two elements of a certain class on a webpage you can use slicing.

soup.findAll("span", {"class" : "mailerAddress"})[0:2]

Upvotes: 0

Related Questions