artomason
artomason

Reputation: 4013

Skip adding a line break if next tag contains text in Beautifulsoup4

I'm trying to prevent BeautifulSoup from adding a line break if the next tag contains the text "Utility".

<html>
    <dl>
        <dt>RandomText</dt>  <!-- Line Break -->
        <dt>RandomText</dt>  <!-- Don't insert Line Break -->
        <dt>Utility: NonStaticText</dt>  <!-- Line Break  -->
    </dl>
</html>

right now I have:

soup.unwrap('head')

for dt in soup.findAll('dt'):
    dt.insert_after('\n')

This is very minimal, but how would I go about this? The text "Utility:" occurs frequently, but the content after "Utility:" is different in every case, and is contained within the tag. I'm using BS4.

UPDATE:

I have found that:

for dt in soup.find_all('dt'):
    if not dt.find(string = re.compile('Utility')):
        dt.insert_before('\n')

seems to somewhat work. What I really need is to evaluate the next tag in the tree and evaluate if it has the string 'Utility', and base my decision off that. Ideally ...

dt.insert_before('n')

should be:

dt.insert_after('n')

UPDATE 2:

This was the solution for me:

for dt in soup.find_all('dt'):
    next_tag = dt.find_next('dt')

    try:  # THROWS 'AttributeError' IF NOT FOUND ...
        if not next_tag.text.startswith('Utility'):
            dt.insert_after('\n')

    except AttributeError as e:
        pass

Upvotes: 0

Views: 319

Answers (1)

t.m.adam
t.m.adam

Reputation: 15376

You can get the next tag with the find_next method, example:

for dt in soup.find_all('dt'):
    next_tag = dt.find_next()
    if not next_tag.text.startswith('Utility:'): 
        dt.insert_after('\n')

Note that if you dont pass any arguments in find_next it will match any tag that follows.

Upvotes: 1

Related Questions