huangkaiyi
huangkaiyi

Reputation: 123

Remove a specific Tag with Beautiful Soup

I'm currently trying to write a script using python with Beautiful Soup to change the camera in indigo file (.igs), but I'm encountering a certain problem :

<scenedata>
   <tonemapping> <camera>...</camera> </tonemapping>
   <camera>...</camera>
</scenedata>

I would like to only remove the "camera" tag that is not inside "tonemapping" tag.

I tried soup.find('').replace_with and soup.select('camera') but it always removes all camera tag.

Upvotes: 1

Views: 2899

Answers (4)

HedgeHog
HedgeHog

Reputation: 25073

Select <camera> that is not child of <tonemapping> simply per css selector.

Option#1 - If there is only one tag:

soup.select_one(':not(tonemapping) > camera').extract()

Option#2 - If there are multiple tags:

for cam in soup.select(':not(tonemapping) > camera'):
    cam.extract()

Example

from bs4 import BeautifulSoup
data="""
<scenedata>
   <tonemapping> <camera>...</camera> </tonemapping>
   <camera>...</camera>
</scenedata>
"""
soup=BeautifulSoup(data,"html.parser")

for cam in soup.select(':not(tonemapping) > camera'):
    cam.extract()

Output

<scenedata>
<tonemapping> <camera>...</camera> </tonemapping>

</scenedata>

Upvotes: 1

user16704247
user16704247

Reputation:

You can try this.

from bs4 import BeautifulSoup

html_doc="""
<scenedata>
   <tonemapping> <camera>...</camera> </tonemapping>
   <camera>...</camera>
</scenedata>
"""

soup = BeautifulSoup(html_doc, 'html.parser')

s = soup.select_one('scenedata tonemapping + camera')
t = s.decompose()


print(soup)

Output

<scenedata>
<tonemapping> <camera>...</camera> </tonemapping>

</scenedata>

Upvotes: 0

Tor G
Tor G

Reputation: 366

You need to adapt it to your case, but try something along the lines of:



    # build soup
    soup = BeautifulSoup(html_string, features='html.parser')
    
    # select first  tag
    map_tag = soup.select_one('tonemapping')
    
    # select the first  tag inside the selected  tag
    camera_tag = map_tag.select_one('camera')
    
    # remove the selected  tag
    camera_tag.decompose()

Upvotes: 0

anymous
anymous

Reputation: 52

Just check parent name and remove what you don't need.

import bs4

text = """
<scenedata>
   <tonemapping> <camera>...</camera> </tonemapping>
   <camera>...</camera>
</scenedata>
"""
soup = bs4.BeautifulSoup(text, features="lxml")

for cam in soup.select("camera"):
    if cam.parent.name != "tonemapping":
        cam.extract()

Upvotes: 1

Related Questions