Elegant way to replace the tag on a sequence of bs4 soup tags and wrap them all in another tag

Question

I have an html string like so:

test = """
    A header
     Just a normal paragraph before 
    • test element
    • test element2
     Following stuff

    
    """

This user has explicitly included the u'\u2022' bullet character instead of using a list. I would like to get the following converted html


    A header
     Just a normal paragraph before 
    
    test element
    test element2
    
     Following stuff

What is the most elegant way to approach this? I can identify then these bulleted items occur from a simple .find on the tag string. I can remove the bullets and wrap them in

tags. But I don't know how to iterate through the soup and then wrap all the bullets into a single

new_soup = []
for tag in soup:
   if has_bullet(tag):
       #  start storing tags
       bullets.append(tag)
   else:
       if bullets: # if we have some bullets to dump
           new_soup.append(ul_tag_start)
           new_soup.extend(modify_text(bullets))
           new_soup.append(ul_tag_end)
       new_soup.append(tag)
       # clear bullets list
       bullets = []

but I don't know to write a new soup element by element, and I wonder if there's a better way using bs4's various insert, insert_before, insert_after, etc. methods.

Ajax1234 · Accepted Answer

You can use recursion:

import bs4, re
from bs4 import BeautifulSoup as soup
test = """A header
 Just a normal paragraph before 
• test element
• test element2something else
 Following stuff"""
def form_ul(d):
   return soup('{}'.format('
'.join(f'{i}' for i in d)), 'html.parser').ul

def to_ul(d):
   c,l = [],[]
   for i in d.contents:
      if isinstance(i, bs4.NavigableString):
         c.append(i)
      else:
         if str(i.get_text(strip=True)).startswith(u'\u2022'):
            l.append('
'.join(j.replace(u'\u2022 ', '') if isinstance(j, bs4.NavigableString) else str(j) for j in i.contents))
         else:
            if l:
               c.append(form_ul(l))
               l = []
            to_ul(i)
            c.append(i)
   if l:
      c.append(form_ul(l))
   d.contents = [j for j in c if not re.findall('^
+$', str(j))]

html = soup(test, 'html.parser')
to_ul(html)
print(soup.prettify(html))

Output:


 A header


 Just a normal paragraph
 
  before
 


 
  test element
 
 
  test element2
  
   something else
  
 

 Following
 
  stuff

Elegant way to replace the tag on a sequence of bs4 soup tags and wrap them all in another tag

Answers (1)

Related Questions