Parsing svg in python

Question

I have SVG and html file, which has couple of java script tags and I need to find all the script tags and insert a comment before the first script tag and than after the last script tag. I am trying to achieve it using Beautifulsoup. It worked well for HTML version but for SVG it is throwing error.

 //for html version of file, working as expected
 soup = BeautifulSoup(data,selfClosingTags=['link','meta'])
 for num,tag in enumerate(soup.findAll('script')):
        if num==0:
            soup.head.insert(-1,startcomment)
        tag.extract()
        soup.head.insert(len(-1,tag)
        if num==len(soup.findAll('script'))-1:
            soup.head.insert(-1,endcomment)

but now when I try to achieve same for the svg as soup = BeautifulSoup(data,"xml") in the first line itself it throws exception.. svg is also xml? so I should be able to do it sameway

Update - SVG format

should be changed to

Martijn Pieters · Accepted Answer

Use BeautifulSoup version 4, not 3, and install lxml to handle XML parsing.

Currently, (as of version 4.3.2), BeautifulSoup does ignore Processing Instructions (like the instruction), see bug 1294645. You can work around this simply by patching the tree builder:

from bs4.builder import LXMLTreeBuilderForXML
from bs4 import ProcessingInstruction

def handle_pi(self, target, data):
    self.soup.endData()
    self.soup.handle_data(target + ' ' + data)
    self.soup.endData(ProcessingInstruction)

LXMLTreeBuilderForXML.pi = handle_pi

The bug has since been marked as solved, and as of BeautifulSoup 4.4 (released July 2015) you no longer need the above work-around.

You want to store the list of script tags in a variable so you can access the first and last tag without looping:

from bs4 import BeautifulSoup, Comment

soup = BeautifulSoup(data, 'xml')
start_comment = soup.new_string('some comment', Comment)
end_comment = soup.new_string('end comment', Comment)

script_tags = soup.find_all('script')
script_tags[0].insert_before(start_comment)
script_tags[-1].insert_after(end_comment)

For your sample SVG document, this results in:

>>> print soup.prettify(formatter='xml')

Parsing svg in python

Answers (1)

Related Questions