user1732337
user1732337

Reputation: 23

python xpath expression find any attribute containing given text

I have the following XML document:

    <RootNode>
         <SubNode name="MainNode" SubNodeID="1">
           <SubSubNode SubSubID="10" SubSubName="Product Food">
             <Item subItemID="100" ItemName="Apple" OtherName="Gala"/>
             <Item subItemID="101" ItemName="Apple" OtherName="Aroma"/>
             <Item subItemID="102" ItemName="Pear" OtherName="Williams"/>
             <Item subItemID="103" ItemName="Pear" OtherName="Abate"/>
             <Item subItemID="104" ItemName="Cranberry" OtherName="Bilberry"/>
             <Item subItemID="105" ItemName="Cranberry" OtherName="Bluberries"/>
             <Item subItemID="106" ItemName="Strawberry" OtherName="Berry"/>
             <Item subItemID="107" ItemName="Peach" OtherName="Nectarina"/>
          </SubSubNode>
          <SubSubNode SubSubID="20" SubSubName="Product Beverage">
            <Item subItemID="108" ItemName="Cola" OtherName="Coca cola"/>
            <Item subItemID="109" ItemName="Cola" OtherName="Pepsi"/>
            <Item subItemID="110" ItemName="Orange Juice" OtherName="Fanta"/>
            <Item subItemID="111" ItemName="Soft drink" OtherName="Grape soda"/>
            <Item subItemID="112" ItemName="Soft drink" OtherName="Orange soda"/>
            <Item subItemID="113" ItemName="Soft drink" OtherName="Grape soda"/>
          </SubSubNode>
        </SubNode>
    </RootNode>

I load it with the usual statements:

    tree = ET.parse('Food.xml')
    root = tree.getroot()

I can find specific items with a specific attribute like OtherName="Gala" using

    xPath = "SubNode/SubSubNode/Item[@OtherName='Gala']"
    print(len(root.findall(xPath)))

What if I want to search for a text in any attribute? Using XPath statements I would write something like:

    //*[@*[contains(., 'berry')]] 

But implementing it in Python I got "SyntaxError: invalid predicate:"

    search_text = "berry"
    # XPath expression to match any element with any attribute containing 'search_text'
    xpath_expr = ".//*[@*[contains(., '{search_text}')]]"

Any ideas? Thank you for your help

Upvotes: 0

Views: 39

Answers (1)

Hermann12
Hermann12

Reputation: 3476

As described in the comments, lxml is the better way. Alternativ solution without xpath:

import xml.etree.ElementTree as ET

xml_s = """<RootNode>
         <SubNode name="MainNode" SubNodeID="1">
           <SubSubNode SubSubID="10" SubSubName="Product Food">
             <Item subItemID="100" ItemName="Apple" OtherName="Gala"/>
             <Item subItemID="101" ItemName="Apple" OtherName="Aroma"/>
             <Item subItemID="102" ItemName="Pear" OtherName="Williams"/>
             <Item subItemID="103" ItemName="Pear" OtherName="Abate"/>
             <Item subItemID="104" ItemName="Cranberry" OtherName="Bilberry"/>
             <Item subItemID="105" ItemName="Cranberry" OtherName="Bluberries"/>
             <Item subItemID="106" ItemName="Strawberry" OtherName="some text"/>
             <Item subItemID="107" ItemName="Peach" OtherName="Nectarina"/>
          </SubSubNode>
          <SubSubNode SubSubID="20" SubSubName="Product Beverage">
            <Item subItemID="108" ItemName="Cola" OtherName="Coca cola"/>
            <Item subItemID="109" ItemName="Cola" OtherName="Pepsi"/>
            <Item subItemID="110" ItemName="Orange Juice" OtherName="Fanta"/>
            <Item subItemID="111" ItemName="Soft drink" OtherName="Grape soda"/>
            <Item subItemID="112" ItemName="some text" OtherName="Orange soda"/>
            <Item subItemID="113" ItemName="Soft drink" OtherName="Grape soda"/>
          </SubSubNode>
        </SubNode>
    </RootNode>"""

root = ET.fromstring(xml_s)

element_list = []
for some_text in root.iter():
    if "some text" in some_text.attrib.values():
        # print(some_text.tag, some_text.attrib)
        element_list.append(some_text)
        
# Find the keys with "some text"
for elem in element_list:
    keys = [k for k, v in elem.attrib.items() if v == 'some text']
    print(keys)

Output:

['OtherName']
['ItemName']

Upvotes: 0

Related Questions