Goroshek
Goroshek

Reputation: 81

Using regex in xml etree parsing

I need to parse xml file and find a values only starts with "123". How i can do this using this code below? It is possible to use regex inside this syntax?

import xml.etree.ElementTree as ET
parse = ET.parse('xml.xml')
print([ events.text for record in parse.findall('.configuration/system/') for events in record.findall('events')])

xml.xml

<rpc-reply>
 <configuration>
        <system>
            <preference>
                <events>123</events>
                <events>124</events>
                <events>1235</events>                    
            </preference>
        </system>
 </configuration>
</rpc-reply>

Upvotes: 0

Views: 3099

Answers (1)

har07
har07

Reputation: 89325

XPath predicate can do that much using built-in function starts-with(). But you need to use library that fully support XPath 1.0 such as lxml:

from lxml import etree as ET
raw = '''<rpc-reply>
 <configuration>
        <system>
            <preference>
                <events>123</events>
                <events>124</events>
                <events>1235</events>                    
            </preference>
        </system>
 </configuration>
</rpc-reply>'''
root = ET.fromstring(raw)
query = 'configuration/system/preference/events[starts-with(.,"123")]'
print([events.text for events in root.xpath(query)])

If you still want to use regex, lxml supports regex despite XPath 1.0 specification does not include regex (see: Regex in lxml for python).

xml.etree only supports limited subset of XPath 1.0 expression, which does not include starts-with function (and definitely does not support regex). So you need to rely on python string function to check that:

....
query = 'configuration/system/preference/events'
print([events.text for events in root.findall(query) if events.text.startswith('123')])

Upvotes: 1

Related Questions