Reputation: 622
I have an xml file with the following line :
<CREATION_DATE>2009-12-20T10:47:07.000Z</CREATION_DATE>
That I would like to replace with the following :
<CREATION_DATE>XXX</CREATION_DATE>
Thought it would be pretty straightforward using the re
module in the python script I'm supposed to modify. I did something of the sort:
if '</CREATION_DATE>' in ligne:
out_lines[i] = re.sub(r'(^.*<CREATION_DATE>).*(</CREATION_DATE>.*$)', r'\1XXX\2', ligne)
The field with the date is correctly replaced, but the trailing new line and indentation are lost in the process. I tried converting ligne
and the result of the sub
function to a raw string with .encode('string-escape')
, with no success. I am a noob in python, but I am a bit accustomed to regex's, and I really cannot see what it is I am doing wrong.
Upvotes: 0
Views: 342
Reputation: 622
As stated in comments, the variable ligne
was stripped of blanks and new lines with ligne = ligne.strip()
elsewhere in the code... I am not deleting my question though because alecxe's answer on the xml module is very informative.
Upvotes: 0
Reputation: 473803
An alternative, a simpler and a more reliable way to replace the text of an XML element would be to use an XML parser. There is even one in the Python Standard Library:
>>> import xml.etree.ElementTree as ET
>>>
>>> s = '<ROOT><CREATION_DATE>2009-12-20T10:47:07.000Z</CREATION_DATE></ROOT>'
>>> root = ET.fromstring(s)
>>> root.find("CREATION_DATE").text = 'XXX'
>>> ET.tostring(root)
'<ROOT><CREATION_DATE>XXX</CREATION_DATE></ROOT>'
Upvotes: 2