Reputation: 110267
I have the following text:
xml = '''
<accessibility_info>
<accessibility role="captions" available="true" />
</accessibility_info>
<crew_member billing="top"
<display_name>John Viscount</display_name>
</crew_member>
<products>
<territory>GB</territory>
</products>'''
I need to remove the following <crew_member>
block. This is what I am currently doing:
clean_xml = re.sub('<crew_member>.*</crew_member>', '', metadata_contents,
flags=re.DOTALL)
However, it is also adding a newline:
<accessibility_info>
<accessibility role="captions" available="true" />
</accessibility_info>
<products>
<territory>GB</territory>
</products>
How would I change the regex to strip the newline as well, so it looks like:
<accessibility_info>
<accessibility role="captions" available="true" />
</accessibility_info>
<products>
<territory>GB</territory>
</products>'
Upvotes: 0
Views: 3115
Reputation: 28
I know this is a little old but I would like to say that the new line comes actually from the method used to write the new text to the file. If I use print()
a new line is added but if I use for example sys.stdout.write()
, then no new line is added.
Upvotes: 0
Reputation: 2677
try this
print re.sub('<crew_member([^\>]*)>.*</crew_member>\n', '', xml, flags=re.DOTALL)
Upvotes: 2