Reputation: 3420
I am trying to work with an xml file from SBA api.
http://api.sba.gov/loans_grants/federal_and_state_financing_for/ny.xml
The problem is when I try to parse this xml with xpath I get this error:
[Fatal Error] loans_grants.dtd:3:22: White space is required before the attribute type in the declaration of attribute "CDATA" for element "count". Exception in thread "main" org.xml.sax.SAXParseException: White space is required before the attribute type in the declaration of attribute "CDATA" for element "count".
After observing the xml file I think the problem is in following lines and similar lines after that:
<grant_loans count="103">
<industry nil="true"/>
<state_name nil="true"/>
I think if there were space between count
and "103"
and nil
and "true"
then this error would not happen. As the whole xml is too big I copied some portion of it and made these changes and saved in my local storage. Then I could run and parse it without error. I just put some spaces like this:
<grant_loans count = "103">
How can I do this with my program to all places that requires space and then use that for further parsing?
I can post my java code here if you need but that code is working for other xml files, so I think this xml file has the problem.
Edit
Java code segment:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder;
Document doc = null;
XPathExpression expr = null;
builder = factory.newDocumentBuilder();
doc = (Document) builder
.parse("http://maps.googleapis.com/maps/api/geocode/xml?address=1600+Amphitheatre+Parkway&sensor=false");
// Create a XPathFactory
XPathFactory xFactory = XPathFactory.newInstance();
// Create a XPath object
XPath xpath = xFactory.newXPath();
// Compile the XPath expression
expr = xpath.compile("//geometry/location/lat/text()");
System.out.println("expr" + expr);
// Run the query and get a nodeset
Object result = expr.evaluate(doc, XPathConstants.NODESET);
// Cast the result to a DOM NodeList
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
//this works
//
// some other code
//
builder = factory.newDocumentBuilder();
url = "http://api.sba.gov/loans_grants/federal_and_state_financing_for/ny.xml";
doc = builder.parse(url); // problem occurs here
xFactory = XPathFactory.newInstance();
// Create a XPath object
xpath = xFactory.newXPath();
// Compile the XPath expression
expr = xpath.compile("//grant_loan/url/text()");
result = expr.evaluate(doc, XPathConstants.NODESET);
// Cast the result to a DOM NodeList
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
//other stuffs
Upvotes: 0
Views: 10724
Reputation: 128829
It's not the XML. It's telling you that the DTD is jacked up. Note the loans_grants.dtd:3:22
at the beginning of the error. It's pointing at line 3:
<!ATTLIST count CDATA>
which should probably instead read
<!ATTLIST grant_loans count CDATA #REQUIRED>
The error is pointing out that the proper format of an ATTLIST
is:
<!ATTLIST element-name attribute-name attribute-type default-value>
It saw the string "CDATA" in the third position, assumed that was the attribute-name, and still expected to get an attribute-type, but instead, it found the end of the ATTLIST
. That's why it gave the potentially confusing message about expecting white space.
Most likely, when you copied some of the xml to run against locally, you left off the DTD declaration, which would also solve the problem.
Upvotes: 1