beautifulsoup4 - grab Sibling element if Sibling present

Question

The most common repetitive structure of the HTML is:

  
   
    it is possible for you

in such situations I grab the text it is possible for you

Occasionally (i.e., not always), the

of class="Standard" has a sibling

of class="P3", like so:

  
   (to ask a question in Spanish, you just use inflection)

When this

of class="P3" is present, I want to additionally grab the text inside it, e.g. here I would additionally grab: (to ask a question in Spanish, you just use inflection)

My question is, given this kind of structure:


...
  
   
    it is possible for you
   
  

  
   
    it is acceptable for me
   
  

  
   (to ask a question in Spanish, you just use inflection)
  
...

How can I produce output like this:

it is possible for you
it is acceptable for me
(to ask a question in Spanish, you just use inflection)

Currently, I've managed to do this:

p_standards = soup.find_all("p", class_ = "Standard")

for p_standard in p_standards:
    p_english = p_standard.find("span", class_="T3")
    print(p_english.contents[0])

And the output I get is:

it is possible for you
it is acceptable for me

lagripe · Accepted Answer

use this :

Python Code :

from bs4 import BeautifulSoup
import re
text = '''


  
   
    it is possible for you
   
  

  
   
    it is acceptable for me
   

  
  
   (to ask a question in Spanish, you just use inflection)
  


'''
soup = BeautifulSoup(text,features='html.parser')
p_standards = soup.find_all("p", class_ = "Standard")

for p_standard in p_standards:
    p_english = p_standard.find('span',attrs={'class':'T3'})
    nextSibling = p_standard.find_next_sibling()
    print(p_english.text)
    if(nextSibling.attrs['class'][0] == 'P3' and nextSibling.name == 'p'):
      print(nextSibling.text)

Demo : Here

Explanation :

In order to get the class value within the find_next_sibling's returned element i had to search into the variables of the instance its self as there is no doc that mentions it on the official website so i printed nextSibling.__dict__.keys()
the 0 index is because the class attribute's type is an array

beautifulsoup4 - grab Sibling element if Sibling present

Answers (2)

Related Questions