Charles
Charles

Reputation: 575

Removing span tags from soup BeautifulSoup/Python

I have a soup in Python like this:

<p>
 <span style="text-decoration: underline; color: #3366ff;">
   Title:
 </span>
 Info
</p>
<p>
 <span style="color: #3366ff;">
  <span style="text-decoration: underline;">
   Title2:
  </span>
 </span>
 Info2
</p>

I'd like to get it to look like this:

<p>
   Title:
 Info
</p>
<p>
   Title2:
 Info2
</p>

Is there a way to do this with bs4?

Upvotes: 9

Views: 18164

Answers (3)

Kantin Itzl&#233;
Kantin Itzl&#233;

Reputation: 11

I wrote this function if it can help :

def deleteBalise(string):
    for i in range(2):
        # identifying  <
        rankBegin = 0
        for carac in string:
            if carac == '<':
                break
            rankBegin += 1
        # identifying  >
        rankEnd = 0
        for carac in string:
            if carac == '>':
                break
            rankEnd += 1
        stringToReplace = string[rankBegin:rankEnd+1]
        string = string.replace(stringToReplace,'')
    return string

Upvotes: 1

avi
avi

Reputation: 9636

You can also use replace_with to remove span tags:

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
for span_tag in soup.findAll('span'):
    span_tag.replace_with('')
print(soup)

Upvotes: 7

Aaron
Aaron

Reputation: 2351

You'll be wanting to use beautifulsoup's unwrap() for this.

import bs4
soup1 = bs4.BeautifulSoup(htm1, 'html.parser')
for match in soup1.findAll('span'):
    match.unwrap()
print soup1

Upvotes: 25

Related Questions