Beautifulsoup find_all() with multiple AND conditions

Question

I have a a HTML source with multiple tags of which some also contain tags. My goal is to delete all superscripts which also contain a link while wrapping the superscripts without a link in square brackets. My code works to a certain extent: it decomposes the correct tags but also wraps the now empty tags in brackets. I tried to find a way to use find_all() with multiple AND conditions for the and tag but to no avail. P.m.: SO has a thread regarding multiple OR conditions.

from bs4 import BeautifulSoup

html = '''
         Heading 
      ¹
    
    
      ²
      This is a paragraph
      ³
    '''

soup = BeautifulSoup(html, "html.parser")

# remove superscripts with links
for superscript in soup.find_all("sup"):
    for suplink in superscript.find_all("a"):
        suplink.decompose()

# wrap remaining superscripts in brackets
    superscript.insert(0, "[")
    superscript.insert(len(superscript.contents), "]")

print(html)

Result:

    
     Heading 
    ^{[
    
    ]}
    
    
    ^[2]
        This is a paragraph
        ^[]

What it should look like:

    
     Heading 
    
    
    ^[2]
          This is a paragraph

Ajax1234 · Accepted Answer

You can recursively traverse the source and update the sups accordingly:

import bs4
def update_sup(d):
   if d.name == 'sup':
      if any(not isinstance(i, bs4.element.NavigableString) for i in d.contents):
         d.extract()
      else:
         d.string = f'[{d.get_text(strip=True)}]'
   for i in filter(lambda x:not isinstance(x, bs4.element.NavigableString), d.contents):
       update_sup(i)

html = '''
     Heading 
  ¹


  ²
  This is a paragraph
  ³
'''
d = bs4.BeautifulSoup(html, 'html.parser')
update_sup(d)
print(d)

Output:


 Heading 



^[2]
  This is a paragraph

Beautifulsoup find_all() with multiple AND conditions

Answers (1)

Related Questions