Reputation: 575
I have a soup in Python like this:
<p>
<span style="text-decoration: underline; color: #3366ff;">
Title:
</span>
Info
</p>
<p>
<span style="color: #3366ff;">
<span style="text-decoration: underline;">
Title2:
</span>
</span>
Info2
</p>
I'd like to get it to look like this:
<p>
Title:
Info
</p>
<p>
Title2:
Info2
</p>
Is there a way to do this with bs4?
Upvotes: 9
Views: 18164
Reputation: 11
I wrote this function if it can help :
def deleteBalise(string):
for i in range(2):
# identifying <
rankBegin = 0
for carac in string:
if carac == '<':
break
rankBegin += 1
# identifying >
rankEnd = 0
for carac in string:
if carac == '>':
break
rankEnd += 1
stringToReplace = string[rankBegin:rankEnd+1]
string = string.replace(stringToReplace,'')
return string
Upvotes: 1
Reputation: 9636
You can also use replace_with
to remove span tags:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
for span_tag in soup.findAll('span'):
span_tag.replace_with('')
print(soup)
Upvotes: 7