re.sub() does not keep blanks and new lines

Question

I have an xml file with the following line :

           2009-12-20T10:47:07.000Z

That I would like to replace with the following :

XXX

Thought it would be pretty straightforward using the re module in the python script I'm supposed to modify. I did something of the sort:

if '' in ligne:
    out_lines[i] = re.sub(r'(^.*).*(.*$)', r'\1XXX\2', ligne)

The field with the date is correctly replaced, but the trailing new line and indentation are lost in the process. I tried converting ligne and the result of the sub function to a raw string with .encode('string-escape'), with no success. I am a noob in python, but I am a bit accustomed to regex's, and I really cannot see what it is I am doing wrong.

alecxe · Accepted Answer

An alternative, a simpler and a more reliable way to replace the text of an XML element would be to use an XML parser. There is even one in the Python Standard Library:

>>> import xml.etree.ElementTree as ET
>>> 
>>> s = '2009-12-20T10:47:07.000Z'
>>> root = ET.fromstring(s)
>>> root.find("CREATION_DATE").text = 'XXX'
>>> ET.tostring(root)
'XXX'

re.sub() does not keep blanks and new lines

Answers (2)

Related Questions