Reputation: 4240
I'm extracting text from an XML file and printing it to a text file using python. Some lines in the xml file have '
' and '
' in them which cause the line to be output to the text file with carriage returns and line feeds. There are answers here Ruby remove 
   and here https://stackoverflow.com/questions/28794365/remove-xd-from-xml on how to remove these characters in Ruby and PHP so that there are no line breaks. How do I do this in Python. Here is my code
with open("xmlfile") as f:
doc = parse(f)
str = doc.getElementsByTagName("informations")[0].getAttribute("text")
print(str)
str = str.replace("
", " ").replace("
", " ")
print(str)
Here is the string in the xml file
"An Airport Contact Method, Is Alter must be one of the following:
- "T" or "F" (boolean true or false) or empty" language="en"
Output:
An Airport Contact Method, Is Alter must be one of the following:
- "T" or "F" (boolean true or false) or empty
An Airport Contact Method, Is Alter must be one of the following:
- "T" or "F" (boolean true or false) or empty
Upvotes: 2
Views: 4157
Reputation: 21661
By the time whatever XML library you're using has parsed it, it's already resolved the entities.
Replace
str = str.replace("
", " ").replace("
", " ")
with
str = str.replace("\r", " ").replace("\n", " ")
Per @martineau's suggestion, if you're ever not sure what character an XML entity is resolving to you can try print(repr(str))
to get a better picture of what the string actually contains once it's been parse
d.
Upvotes: 3