Reputation: 4013
I'm trying to prevent BeautifulSoup from adding a line break if the next tag contains the text "Utility".
<html>
<dl>
<dt>RandomText</dt> <!-- Line Break -->
<dt>RandomText</dt> <!-- Don't insert Line Break -->
<dt>Utility: NonStaticText</dt> <!-- Line Break -->
</dl>
</html>
right now I have:
soup.unwrap('head')
for dt in soup.findAll('dt'):
dt.insert_after('\n')
This is very minimal, but how would I go about this? The text "Utility:" occurs frequently, but the content after "Utility:" is different in every case, and is contained within the tag. I'm using BS4.
UPDATE:
I have found that:
for dt in soup.find_all('dt'):
if not dt.find(string = re.compile('Utility')):
dt.insert_before('\n')
seems to somewhat work. What I really need is to evaluate the next tag in the tree and evaluate if it has the string 'Utility', and base my decision off that. Ideally ...
dt.insert_before('n')
should be:
dt.insert_after('n')
UPDATE 2:
This was the solution for me:
for dt in soup.find_all('dt'):
next_tag = dt.find_next('dt')
try: # THROWS 'AttributeError' IF NOT FOUND ...
if not next_tag.text.startswith('Utility'):
dt.insert_after('\n')
except AttributeError as e:
pass
Upvotes: 0
Views: 319
Reputation: 15376
You can get the next tag with the find_next
method, example:
for dt in soup.find_all('dt'):
next_tag = dt.find_next()
if not next_tag.text.startswith('Utility:'):
dt.insert_after('\n')
Note that if you dont pass any arguments in find_next
it will match any tag that follows.
Upvotes: 1