Reputation: 64854
I'm using Digester to parse a xml file and I get the following error:
May 3, 2011 6:41:25 PM org.apache.commons.digester.Digester fatalError
SEVERE: Parse Fatal Error at line 2336608 column 3: The element type "user" must be terminated by the matching end-tag "</user>".
org.xml.sax.SAXParseException: The element type "user" must be terminated by the matching end-tag "</user>".
However 2336608 is the last line of my text file. I guess I'm opening a tag and I never close it. Do you know how can I find it and fix it, in big text files ?
thanks
Upvotes: 2
Views: 8183
Reputation: 4695
I think there is no need to start scripting for detecting xml errors. You can use the w3 xml validator for instance http://www.w3schools.com/xml/xml_validator.asp
I just pasted a 15 mb xml in there and I managed to fix it quite easily. You can also input the xml as a url if you have the possibility to upload it somewhere. Java reported the error in some place which seemed fine, but this tool localized the actual error, and after correcting that, java didn't error anymore.
There are many types of xml errors, and are not all related to the nested structure, so it is best to just use a well known tool for this. For instance, my error was an argument error(I was missing a ") but java detected a nesting problem.
Upvotes: 0
Reputation: 27326
$ grep -Hin "</\?user>" Text.xml
will print out every line with either or . If they're not nested, then you should be able to inspect that output fand find the missing close tag (when immediately follows . A script do do the same:
https://gist.github.com/953837
This assumes that the open and close tags are on different lines.
Upvotes: 1
Reputation: 120526
Use tidy -xml -e <your-xml-file>
. http://tidy.sourceforge.net/
Tidy is a great little tool for validating HTML, and in XML mode (-xml
above) it will validate XML as well.
It prints out line and column numbers for parse errors.
Most of the major package managers (apt, port, etc.) will have pre-built packages for it.
Upvotes: 1
Reputation: 139991
Write another script which scans each file of the line and whenever it finds an open <user>
tag, increments a counter and prints
line number 1234 <user> opened (1 open total)
and whenever it finds a close </user>
tag, decrements the counter prints
line number 4546 </user> closed (0 open total)
Since you have one more opening tag than closing tag, the final output of this script will tell you that 1 tag was left open. However, assuming that your XML model does not allow for nested <user>
tags, then you can assume the problemsome declaration is wherever you see the output of line number ... <user> opened (2 open total)
.
Upvotes: 2