Reputation: 1367
I'm using ElementTree to compare a CSV file to an XML document. The script should update the tags if the tag matches the first cell in the CSV. The tag needs to have a non-breaking space to prevent the text from wrapping when I import the XML into a different program (InDesign).
XML Input:
<Table_title>fatal crashes by time of day</Table_title>
<cell>data1</cell>
<cell>data2</cell>
<cell>data3</cell>
CSV input:
'fatal crashes by time of day', data1, data2, data3
However, when I read the XML into the ElementTree script using ET.parse('file.xml')
, it seems to render the character a non-breaking space:
<Table_title>fatal crashes by time of day</Table_title>
<cell>data1</cell>
<cell>data2</cell>
<cell>data3</cell>
Which is exactly what it should do (I think). But in this scenario, I actually want  
to render as a string, so that it matches the first cell of the CSV (because when the CSV is read in, it interprets it as a string: 'fatal crashes by time of day'
).
Is there a way to:
<Table_title>fatal crashes by time of day</Table_title>
or
'fatal crashes by time of day', data1, data2, data3
Upvotes: 1
Views: 249
Reputation: 338406
Here is what happens.
You read this XML into ElementTree:
<Table_title>fatal crashes by time of day</Table_title>
ElementTree parses it and turns it into this DOM:
Table_title
"fatal crashes by・time of day"
(where ・
is to represent the character with code 160, i.e. the non-breaking space)This is 100% correct and you can't (and should not want to) do anything about it.
Your CSV also appears to contain a snippet of XML in its first column. However, it remains un-parsed until you parse it.
If you want to be able to compare the text values, you have no choice but to XML-parse the first column.
import csv
import xml.etree.ElementTree as ET
# open your XML and CSV files...
for row in csv_reader:
temp = ET.fromstring('<temp>' + row[0] + '</temp>')
print(temp.text)
# compare temp.text to your XML
Upvotes: 2