Reputation: 1053
I have a script which pulls XML hosted online and saves it locally. The script then goes through the local file and replaces/adds certain text. However, for some reason, when I use the "&" symbol, there is an extra space inserted along with it within the element text. Here is a sample of the XML elements I am parsing:
<TrackingEvents>
<Tracking event="rewind">
http://www.example.com/rewind_1.png?test=rewind_test
</Tracking>
<Tracking event="pause">
http://www.example.com/pause_1.png?test=rewind_test
</Tracking>
However, after running my script to add the additional test to my elements, the text is added with an additional space, like this:
<TrackingEvents>
<Tracking event="rewind">
http://www.example.com/rewind_1.png?test=rewind_test &cb={CACHEBUSTER}
</Tracking>
<Tracking event="pause">
http://www.example.com/pause_1.png?test=rewind_test &cb={CACHEBUSTER}
</Tracking>
I have tried everything but I don't know why this is occurring or what I can do to prevent this space from being added. I have even tried to strip the white space as well. When I look at the XML that is saved locally before uploading it, everything looks fine (& is for the "&" symbol) as seen here from the source:
<Tracking event="rewind">
http://www.example.com/rewind_1.png?test=rewind_test
&cb={CACHEBUSTER}</Tracking>
<Tracking event="pause">
http://www.example.com/pause_1.png?test=rewind_test
&cb={CACHEBUSTER}</Tracking>
Here is what the code from my script looks like:
for URL, xml_name, original_server in tqdm(XML_tags):
response = requests.get(URL)
with open(xml_name, 'wb') as file:
file.write(response.content)
with open(xml_name) as saved_file:
tree = ET.parse(saved_file)
root = tree.getroot()
for element in root.iter(tag=ET.Element):
if element.text != None:
if ".png" in element.text:
if "?" in element.text:
element.text = element.text + "&cb={CACHEBUSTER}"
element.text = element.text.strip()
else:
element.text = element.text + "?cb={CACHEBUSTER}"
element.text = element.text.strip()
else:
pass
server = "example.server: ../sample/sample/" + original_server
tree.write(xml_name, xml_declaration=True, method='xml',
encoding='utf8')
server_upload = subprocess.Popen(["scp", xml_name, server])
upload_wait = os.waitpid(server_upload.pid, 0)
I can definitely use some help with this. Thanks.
Update: Actually, it appears that this has nothing to do with using the "&". Here is a sample when I just add different text:
<TrackingEvents>
<Tracking event="rewind">
http://www.example.com/rewind_1.png?test=rewind_test test123
</Tracking>
<Tracking event="pause">
http://www.example.com/pause_1.png?test=rewind_test test123
</Tracking>
</TrackingEvents>
Upvotes: 0
Views: 494
Reputation: 89305
The whitespace was in the original XML even before you add anything to element.text
; it is the newline between the last letter in the .text
and the closing tag. So you should have removed the whitespace before appending text instead of after appending as you did in your code above :
....
if "?" in element.text:
element.text = element.text.strip() + "&cb={CACHEBUSTER}"
else:
element.text = element.text.strip() + "?cb={CACHEBUSTER}"
....
Upvotes: 1