Reputation: 866
I have an XML file that needs to have a "TAB" character as a value to a key. Based on this link Represent space and tab in XML tag I encoded it as 	 rather than use "\t" as it was interpreting it as string containing two characters '\' and 't'.
I did not use the CDATA section as that would still consider a TAB as a string containing two characters '\' and 't'
The sample XML file of my use case looks like this
<?xml version="1.0" encoding="UTF-8"?>
<keys>
<key>
<name>key1</name>
<value>value1</value>
</key>
<key>
<name>key2</name>
<value>	</value>
</key>
<key>
<name>key3</name>
<value>2048</value>
</key>
</keys>
This is the code that I have right now that is not able to handle this TAB character
...
dom_obj = minidom.parse(self.path_to_xml)
...
for each_key_child in key_child:
if each_key_child.nodeType == Node.ELEMENT_NODE:
if each_key_child.nodeName == 'name':
node_name = str(each_key_child.childNodes[0].data.strip())
elif each_key_child.nodeName == 'value':
node_value = str(each_key_child.childNodes[0].data.strip())
else:
pass
else:
pass
The output that I get after the script is executed is
'key1': 'value1',
'key2': '',
'key3': '2048',
But when I execute it on the Python interactive interpreter
mobj = minidom.parse(path_to_xml_file)
mobj.getElementsByTagName("value")[1].childNodes[0]
I get the following output
<DOM Text node "u'\t'">
But I am not able to assign the output to a variable. This step is not working
node = mobj.getElementsByTagName("value")[1].childNodes[0].data
But another strange thing is that when I just say node at the interpreter it is printing '\t' !!
node
u'\t'
To see if this was a genuine case where the TAB character was getting stored in the variable but not getting displayed I used it as a separator to concatenate two strings.
This works fine at the interpreter but not in the script the output of which I saw on vim through the :set list option
Can anyone tell me what is wrong with the approach taken by me. Help appreciated!
Upvotes: 1
Views: 940
Reputation: 365707
You're calling strip()
. This strips tabs. Just don't do that. (Or, if you need to strip spaces or newlines or something specific, but leave tabs, call it with a specific argument, like strip('\n')
.)
Here's a demonstration (faked, because your example XML isn't valid, so I can't test it):
>>> mobj.getElementsByTagName("value")[1].childNodes[0]
<DOM Text node "u'\t'">
>>> mobj.getElementsByTagName("value")[1].childNodes[0].data
u'\t'
>>> mobj.getElementsByTagName("value")[1].childNodes[0].data.strip()
u''
>>> mobj.getElementsByTagName("value")[1].childNodes[0].data.strip('\n')
u'\t'
Upvotes: 3