Reputation: 61550
I am trying to parse an XML file using elementtree. The XML file I am trying to read however got exported from MySql and when the XML file is created if I have an entry in the database like: c:cygwin\bin it translates the '\b' as a backspace. Anyway I am trying to delete all the entries of '\b' from the XML file so I can send it through the elementtree.parse() method. And for some reason, after removing all the entries of '\b' I am not writing the entire file out.
Here is what I am doing:
def preprocess(file):
#exporting from MySQL query browser adds a weird
#character to the result set, remove it
#so the XML parser can read the data
print "in preprocess"
lines = map(lambda line: line.replace("\b", " "), file)
#go to the beginning of the file
file.seek(0);
#overwrite with correct data
file.writelines(lines)
sys.exit()
'''Entry into the program'''
#test the file to see if processing is needed before parsing
for line in xml_file:
p = re.compile("\\b") #search for '\b'
if(p.match(line)):
processing = True
break #only one match needed
if processing:
preprocess(xml_file)
The results are I end up with an XML file that has the header cut off, so when passed to the parser it fails.
This is what gets cut out of the XML file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ROOT SYSTEM "diskreport.dtd">
<ROOT>
<row>
<field name="buildid">26960</field>
<field name="cast(status as char)">Filesystem 1K-blocks Used Available Use% Mounted on
C:cygwinin 285217976 88055920 197162056 31% /usr/bin
Any help/ideas would be awesome, Thanks
Upvotes: 1
Views: 1129
Reputation: 61550
I figured out the problem, I was using p.match to look for matches of '\b' when I really needed to be using p.search, p.match only looks from the beginning of the line, search looks for occurences throughout the entire line.
Solution:
def preprocess(file):
#exporting from MySQL query browser adds a weird
#character to the result set, remove it
#so the XML parser can read the data
print "in preprocess"
lines = map(lambda line: line.replace("\b", ""), file)
#go to the beginning of the file
file.seek(0);
#overwrite with correct data
file.writelines(lines)
sys.exit()
'''Entry into the program'''
#test the file to see if processing is needed before parsing
for line in xml_file:
p = re.compile("\\b")
if(p.search(line)): ####Changed to p.search here
processing = True
break #only one match needed
if processing:
preprocess(xml_file)
Upvotes: 1