Valentin B.
Valentin B.

Reputation: 622

re.sub() does not keep blanks and new lines

I have an xml file with the following line :

           <CREATION_DATE>2009-12-20T10:47:07.000Z</CREATION_DATE> 

That I would like to replace with the following :

           <CREATION_DATE>XXX</CREATION_DATE> 

Thought it would be pretty straightforward using the re module in the python script I'm supposed to modify. I did something of the sort:

if '</CREATION_DATE>' in ligne:
    out_lines[i] = re.sub(r'(^.*<CREATION_DATE>).*(</CREATION_DATE>.*$)', r'\1XXX\2', ligne)

The field with the date is correctly replaced, but the trailing new line and indentation are lost in the process. I tried converting ligne and the result of the sub function to a raw string with .encode('string-escape'), with no success. I am a noob in python, but I am a bit accustomed to regex's, and I really cannot see what it is I am doing wrong.

Upvotes: 0

Views: 342

Answers (2)

Valentin B.
Valentin B.

Reputation: 622

As stated in comments, the variable ligne was stripped of blanks and new lines with ligne = ligne.strip() elsewhere in the code... I am not deleting my question though because alecxe's answer on the xml module is very informative.

Upvotes: 0

alecxe
alecxe

Reputation: 473803

An alternative, a simpler and a more reliable way to replace the text of an XML element would be to use an XML parser. There is even one in the Python Standard Library:

>>> import xml.etree.ElementTree as ET
>>> 
>>> s = '<ROOT><CREATION_DATE>2009-12-20T10:47:07.000Z</CREATION_DATE></ROOT>'
>>> root = ET.fromstring(s)
>>> root.find("CREATION_DATE").text = 'XXX'
>>> ET.tostring(root)
'<ROOT><CREATION_DATE>XXX</CREATION_DATE></ROOT>'

Upvotes: 2

Related Questions