thanks_in_advance
thanks_in_advance

Reputation: 2743

beautifulsoup4 - grab Sibling element if Sibling present

The most common repetitive structure of the HTML is:

  <p class="Standard">
   <span class="T3">
    it is possible for you
   </span>
  </p>

in such situations I grab the text it is possible for you

Occasionally (i.e., not always), the <p> of class="Standard" has a sibling <p> of class="P3", like so:

  <p class="P3">
   (to ask a question in Spanish, you just use inflection)
  </p>

When this <p> of class="P3" is present, I want to additionally grab the text inside it, e.g. here I would additionally grab: (to ask a question in Spanish, you just use inflection)

My question is, given this kind of structure:

<div>
...
  <p class="Standard">
   <span class="T3">
    it is possible for you
   </span>
  </p>

  <p class="Standard">
   <span class="T3">
    it is acceptable for me
   </span>
  </p>

  <p class="P3">
   (to ask a question in Spanish, you just use inflection)
  </p>
...
</div>

How can I produce output like this:

it is possible for you
it is acceptable for me
(to ask a question in Spanish, you just use inflection)

Currently, I've managed to do this:

p_standards = soup.find_all("p", class_ = "Standard")

for p_standard in p_standards:
    p_english = p_standard.find("span", class_="T3")
    print(p_english.contents[0])

And the output I get is:

it is possible for you
it is acceptable for me

Upvotes: 0

Views: 56

Answers (2)

QHarr
QHarr

Reputation: 84465

I think it is more efficient to use css Or syntax and an adjacent sibling combinator to perform this

from bs4 import BeautifulSoup as bs

html = '''
<div>
...
  <p class="Standard">
   <span class="T3">
    it is possible for you
   </span>
  </p>

  <p class="Standard">
   <span class="T3">
    it is acceptable for me
   </span>
  </p>

  <p class="P3">
   (to ask a question in Spanish, you just use inflection)
  </p>
...
</div>
'''
soup = bs(html, 'lxml')
items = [i.text.strip() for i in soup.select('.Standard, .Standard + .P3')]
print(items)

Upvotes: 1

lagripe
lagripe

Reputation: 764

use this :

Python Code :

from bs4 import BeautifulSoup
import re
text = '''
<div>

  <p class="Standard">
   <span class="T3">
    it is possible for you
   </span>
  </p>

  <p class="Standard">
   <span class="T3">
    it is acceptable for me
   </span>

  </p>
  <p class="P3">
   (to ask a question in Spanish, you just use inflection)
  </p>

</div>
'''
soup = BeautifulSoup(text,features='html.parser')
p_standards = soup.find_all("p", class_ = "Standard")

for p_standard in p_standards:
    p_english = p_standard.find('span',attrs={'class':'T3'})
    nextSibling = p_standard.find_next_sibling()
    print(p_english.text)
    if(nextSibling.attrs['class'][0] == 'P3' and nextSibling.name == 'p'):
      print(nextSibling.text)

Demo : Here

Explanation :

  • In order to get the class value within the find_next_sibling's returned element i had to search into the variables of the instance its self as there is no doc that mentions it on the official website so i printed nextSibling.__dict__.keys()
  • the 0 index is because the class attribute's type is an array

Upvotes: 1

Related Questions