Angel Picallo
Angel Picallo

Reputation: 69

XPath to access an attribute value with a name that has special characters

I'm trying to access to attribute value, but the attribute name has special characters, for example:

<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <row>
    <ELEMENT1 at:it="true">W</ELEMENT1>------
    <ELEMENT2>IN</ELEMENT2>
    <ELEMENT3>RP</ELEMENT3>
    <ELEMENT4>KKK</ELEMENT4>
  </row>
  <row>
    <ELEMENT1 acón='sys'>2</ELEMENT1>------
    <ELEMENT2>ARQ</ELEMENT2>
    <ELEMENT3>MR</ELEMENT3>
    <ELEMENT4>AC</ELEMENT4>
  </row>
  <row>
     <ELEMENT1>3</ELEMENT1>
    <ELEMENT2>I</ELEMENT2>
    <ELEMENT3 at:it="true" >RP</ELEMENT3>------
    <ELEMENT4>KKK</ELEMENT4>
  </row>
  <row>
    <ELEMENT1>1</ELEMENT1>
    <ELEMENT2>CC</ELEMENT2>
    <ELEMENT3>XX</ELEMENT3>
    <ELEMENT4 eléct='false' >I</ELEMENT4>------
  </row>
  <row>
     <ELEMENT1>12</ELEMENT1>
    <ELEMENT2 at:it="true" >IN</ELEMENT2>------
    <ELEMENT3>3</ELEMENT3>
    <ELEMENT4></ELEMENT4>
  </row>
</root>

if I change the names of the attributes and remove them special characters, I can access them:

at:it ------> atit
Acón ------> Acon
eléctr ------> elect

but attribute names with special characters I can not access them with XPath query expression.

How I can access an XML file to values of attributes with names that have special characters?

To transform the XML file to DOM I used Java6, javax.xml., org.w3c.dom.

Upvotes: 4

Views: 2140

Answers (3)

Sofia smile
Sofia smile

Reputation: 1

  1. first get the attributes from your nodes and then check their name.

  2. Something like: XPath xpath = XPathFactory.newInstance().newXPath(); NodeList nodes = (NodeList) xpath.evaluate(filteringExpression, xmlDocument, XPathConstants.NODESET);

  3. Then iterate through nodes and for each node get its attribute: Node node = nodes.item(idx); NamedNodeMap nl = node.getAttributes();

  4. Then iterate through attributes and if the name matches ,get its value: Attr attr = (Attr) nl.item(i); if(attr.getName().equals(...)) String attributeValue = attr.getValue();

Upvotes: 0

kjhughes
kjhughes

Reputation: 111541

Realize that a colon (:) should only be used in an element or attribute name if part of namespace prefix:

Note:

The Namespaces in XML Recommendation [XML Names] assigns a meaning to names containing colon characters. Therefore, authors should not use the colon in XML names except for namespace purposes, but XML processors must accept the colon as a name character.

So,

/root/row/ELEMENT1/@at:it

will work to select "true" provided that you change your XML by defining the at namespace prefix in your XML (preferable),

<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:at="http://example.com/at">
  <row>
    <ELEMENT1 at:it="true">W</ELEMENT1>------
    <ELEMENT2>IN</ELEMENT2>
    <ELEMENT3>RP</ELEMENT3>
    <ELEMENT4>KKK</ELEMENT4>
  </row>
  <row>
    <ELEMENT1 acón='sys'>2</ELEMENT1>------
    <ELEMENT2>ARQ</ELEMENT2>
    <ELEMENT3>MR</ELEMENT3>
    <ELEMENT4>AC</ELEMENT4>
  </row>
  <row>
     <ELEMENT1>3</ELEMENT1>
    <ELEMENT2>I</ELEMENT2>
    <ELEMENT3 at:it="true" >RP</ELEMENT3>------
    <ELEMENT4>KKK</ELEMENT4>
  </row>
  <row>
    <ELEMENT1>1</ELEMENT1>
    <ELEMENT2>CC</ELEMENT2>
    <ELEMENT3>XX</ELEMENT3>
    <ELEMENT4 eléct='false' >I</ELEMENT4>------
  </row>
  <row>
     <ELEMENT1>12</ELEMENT1>
    <ELEMENT2 at:it="true" >IN</ELEMENT2>------
    <ELEMENT3>3</ELEMENT3>
    <ELEMENT4></ELEMENT4>
  </row>
</root>

or instruct your XML processor to ignore XML namespaces (not a best practice).

The next two cases are fine:

/root/row/ELEMENT1/@acón

will select "sys" without problem provided your XPath processor supports UTF-8 encoding (and it should).

/root/row/ELEMENT4/@eléct

will select "false" similarly.

Upvotes: 1

vanje
vanje

Reputation: 10373

I tried it with Java 6 and had no problems to access attributes with accents. The colon is a special case, because it is used to denote element/attribute names with namespace prefixes. The XML doesn't use namespaces otherwise there were a namespace declaration for prefix at.

The XML parser has a switch to treat colons as part of the name but the XPath engine is always namespace aware. But with a little trick it is also possible:

File xmlFile = new File("in.xml");
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// Parse without namespaces. Otherwise parsing leads to an error 
// because there is no namespace declaration for prefix 'at'.
factory.setNamespaceAware(false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(xmlFile);

XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();

XPathExpression expr1 = xpath.compile("/root/row/ELEMENT1/@acón");
//XPathExpression expr2 = xpath.compile("/root/row/ELEMENT1/@at:it");  Doesn't work!
XPathExpression expr2 = xpath.compile("/root/row/ELEMENT1/@*[name() = 'at:it']");
XPathExpression expr3 = xpath.compile("/root/row/ELEMENT4/@eléct");

System.out.println((String) expr1.evaluate(doc, XPathConstants.STRING));
System.out.println((String) expr2.evaluate(doc, XPathConstants.STRING));
System.out.println((String) expr3.evaluate(doc, XPathConstants.STRING));

The output is:

sys
true
false

Upvotes: 1

Related Questions