Reputation: 7423
I have SVG and html file, which has couple of java script tags and I need to find all the script tags and insert a comment before the first script tag and than after the last script tag. I am trying to achieve it using Beautifulsoup. It worked well for HTML version but for SVG it is throwing error.
//for html version of file, working as expected
soup = BeautifulSoup(data,selfClosingTags=['link','meta'])
for num,tag in enumerate(soup.findAll('script')):
if num==0:
soup.head.insert(-1,startcomment)
tag.extract()
soup.head.insert(len(-1,tag)
if num==len(soup.findAll('script'))-1:
soup.head.insert(-1,endcomment)
but now when I try to achieve same for the svg as soup = BeautifulSoup(data,"xml")
in the first line itself it throws exception.. svg is also xml? so I should be able to do it sameway
Update - SVG format
<?xml version="1.0"?>
<?xml-stylesheet href="../../../some.css" type="text/css"?>
<svg id="mycontent" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:svg="http://www.w3.org/2000/svg" version="1.2" baseProfile="tiny" focusable="true" onload="Jsfunction.load()">
<script xlink:href="../first.js" />
<script xlink:href="../second.js" />
<script xlink:href="../third.js" />
</svg>
should be changed to
<?xml version="1.0"?>
<?xml-stylesheet href="../../../some.css" type="text/css"?>
<svg id="mycontent" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:svg="http://www.w3.org/2000/svg" version="1.2" baseProfile="tiny" focusable="true" onload="Jsfunction.load()">
<!-- some comment -->
<script xlink:href="../first.js" />
<script xlink:href="../second.js" />
<script xlink:href="../third.js" />
<!-- end comment -->
</svg>
Upvotes: 0
Views: 4080
Reputation: 1121942
Use BeautifulSoup version 4, not 3, and install lxml
to handle XML parsing.
Currently, (as of version 4.3.2), BeautifulSoup does ignore Processing Instructions (like the <?xml-stylesheet?>
instruction), see bug 1294645. You can work around this simply by patching the tree builder:
from bs4.builder import LXMLTreeBuilderForXML
from bs4 import ProcessingInstruction
def handle_pi(self, target, data):
self.soup.endData()
self.soup.handle_data(target + ' ' + data)
self.soup.endData(ProcessingInstruction)
LXMLTreeBuilderForXML.pi = handle_pi
The bug has since been marked as solved, and as of BeautifulSoup 4.4 (released July 2015) you no longer need the above work-around.
You want to store the list of script
tags in a variable so you can access the first and last tag without looping:
from bs4 import BeautifulSoup, Comment
soup = BeautifulSoup(data, 'xml')
start_comment = soup.new_string('some comment', Comment)
end_comment = soup.new_string('end comment', Comment)
script_tags = soup.find_all('script')
script_tags[0].insert_before(start_comment)
script_tags[-1].insert_after(end_comment)
For your sample SVG document, this results in:
>>> print soup.prettify(formatter='xml')
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet href="../../../some.css" type="text/css"?>
<svg:svg baseProfile="tiny" focusable="true" id="mycontent" onload="Jsfunction.load()" version="1.2" xmlns="http://www.w3.org/2000/svg" xmlns:svg="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<!--some comment-->
<svg:script xlink:href="../first.js"/>
<svg:script xlink:href="../second.js"/>
<svg:script xlink:href="../third.js"/>
<!--end comment-->
</svg:svg>
Upvotes: 2