guettli
guettli

Reputation: 27149

Access text of next sibling

Here is a part of a jenkins xml file.

I want to extract the defaultValue of project_name with xpath.

I this case the value is *****.

<?xml version='1.0' encoding='UTF-8'?>
<project>
    <properties>
        <hudson.model.ParametersDefinitionProperty>
            <parameterDefinitions>
                <hudson.model.StringParameterDefinition>
                    <name>customer_name</name>
                    <description></description>
                    <defaultValue>my_customer</defaultValue>
                </hudson.model.StringParameterDefinition>
                <hudson.model.StringParameterDefinition>
                    <name>project_name</name>
                    <description></description>
                    <defaultValue>*****</defaultValue>
                </hudson.model.StringParameterDefinition>
            </parameterDefinitions>
        </hudson.model.ParametersDefinitionProperty>
    </properties>
 </project>

I use etree of python, but AFAIK this does not matter much since this is a xpath question.

My current xpath knowledge is limited. My current approach:

for name_tag in config.findall('.//name'):
    if name_tag.text=='project_host':
        default=name_tag.getparent().findall('defaultValue')[0].text

Here I get AttributeError: 'Element' object has no attribute 'getparent'

I thought about this again, and I think that looping in python is the wrong approach. This should be selectable via xpath.

Upvotes: 2

Views: 884

Answers (2)

Mathias M&#252;ller
Mathias M&#252;ller

Reputation: 22647

The XPath answer to your question is

/project/properties/hudson.model.ParametersDefinitionProperty/parameterDefinitions/hudson.model.StringParameterDefinition[name = 'project_name']/defaultValue/text()

which will select as the only result

*****

Given that your actual document does not have a namespace. You do not need to access the parent element nor a sibling axis.

Even etree should support this kind of XPath expressions, but it might not - see the comment by har07.


I thought about this again, and I think that looping in python is the wrong approach. This should be selectable via xpath.

Yes, I agree. If you want to select a single value from a document, select it with an XPath expression and store it as a Python string directly, without looping through elements.


Full example with lxml

from lxml import etree
from StringIO import StringIO

document_string = """<project>
    <properties>
        <hudson.model.ParametersDefinitionProperty>
            <parameterDefinitions>
                <hudson.model.StringParameterDefinition>
                    <name>customer_name</name>
                    <description></description>
                    <defaultValue>my_customer</defaultValue>
                </hudson.model.StringParameterDefinition>
                <hudson.model.StringParameterDefinition>
                    <name>project_name</name>
                    <description></description>
                    <defaultValue>*****</defaultValue>
                </hudson.model.StringParameterDefinition>
            </parameterDefinitions>
        </hudson.model.ParametersDefinitionProperty>
    </properties>
 </project>"""

tree = etree.parse(StringIO(document_string))

result_list = tree.xpath("/project/properties/hudson.model.ParametersDefinitionProperty/parameterDefinitions/hudson.model.StringParameterDefinition[name = 'project_name']/defaultValue/text()")

print result_list[0]

Output:

*****

Upvotes: 2

Learner
Learner

Reputation: 5302

You can try lxml.etree as below- I used looping to select all nodes that have same position.

Examples of needed xpaths are - I used relative xpath since it is very useful incase of long node path.

.//hudson.model.StringParameterDefinition/name[contains(text(),'project_name')]/following-sibling::defaultValue

OR

.//hudson.model.StringParameterDefinition/name[contains(text(),'project_name')]/following::defaultValue[1]

from lxml import etree as et

data  = """<?xml version='1.0' encoding='UTF-8'?>
<project>
    <properties>
        <hudson.model.ParametersDefinitionProperty>
            <parameterDefinitions>
                <hudson.model.StringParameterDefinition>
                    <name>customer_name</name>
                    <description></description>
                    <defaultValue>my_customer</defaultValue>
                </hudson.model.StringParameterDefinition>
                <hudson.model.StringParameterDefinition>
                    <name>project_name</name>
                    <description></description>
                    <defaultValue>*****</defaultValue>
                </hudson.model.StringParameterDefinition>
            </parameterDefinitions>
        </hudson.model.ParametersDefinitionProperty>
    </properties>
 </project>"""

tree = et.fromstring(data)

print [i.text for i in tree.xpath(".//hudson.model.StringParameterDefinition/defaultValue")]
print [i.text for i in tree.xpath(".//hudson.model.StringParameterDefinition/name[contains(text(),'project_name')]/following-sibling::defaultValue")]
print [i.text for i in tree.xpath(".//hudson.model.StringParameterDefinition/name[contains(text(),'project_name')]/following::defaultValue[1]")]

Output-

['my_customer', '*****']
['*****']
['*****']

Upvotes: 1

Related Questions