David542
David542

Reputation: 110267

re.sub adding a newline in python

I have the following text:

xml = '''
<accessibility_info>
    <accessibility role="captions" available="true" />
</accessibility_info>
<crew_member billing="top"
    <display_name>John Viscount</display_name>
</crew_member>
<products>
    <territory>GB</territory>
</products>'''

I need to remove the following <crew_member> block. This is what I am currently doing:

clean_xml = re.sub('<crew_member>.*</crew_member>', '', metadata_contents, 
                    flags=re.DOTALL)

However, it is also adding a newline:

<accessibility_info>
    <accessibility role="captions" available="true" />
</accessibility_info>

<products>
    <territory>GB</territory>
</products>

How would I change the regex to strip the newline as well, so it looks like:

<accessibility_info>
    <accessibility role="captions" available="true" />
</accessibility_info>
<products>
    <territory>GB</territory>
</products>'

Upvotes: 0

Views: 3115

Answers (2)

fedeman
fedeman

Reputation: 28

I know this is a little old but I would like to say that the new line comes actually from the method used to write the new text to the file. If I use print() a new line is added but if I use for example sys.stdout.write(), then no new line is added.

Upvotes: 0

Steve Peak
Steve Peak

Reputation: 2677

try this

print re.sub('<crew_member([^\>]*)>.*</crew_member>\n', '', xml, flags=re.DOTALL)

Upvotes: 2

Related Questions