xpath work on one file, but do not work with another

Question

I need to extract id value from XML. I wrote next code. It work on simple example. But return None on real XML. Code:

from lxml import etree

parser = etree.XMLParser(ns_clean=True)
tree = etree.parse('real.xml', parser)
#tree = etree.parse('test.xml', parser)

#print(dir(tree.find("//id")))
print(tree.find("//id").text)

test.xml:




qqq
123

real.xml:



    
        18934116
        0373100043519000001
        2019-01-11T11:06:05.465+03:00
        №0373100043519000001
        http://zakupki.gov.ru/epz/order/notice/inm111/view/common-info.html?regNumber=0373100043519000001
        
            http://zakupki.gov.ru/epz/order/notice/printForm/viewXml.html?noticeId=18934116
            
        
        Теплоснабжение
        
            
                03731000435
                001Ч1823
                ФЕДЕРАЛЬНОЕ ГОСУДАРСТВЕННОЕ БЮДЖЕТНОЕ УЧРЕЖДЕНИЕ НАУКИ ИНСТИТУТ ВОДНЫХ ПРОБЛЕМ РОССИЙСКОЙ АКАДЕМИИ НАУК
                Российская Федерация, 119333, Москва, УЛ ГУБКИНА, 3
                Российская Федерация, 119333, Москва, УЛ ГУБКИНА, 3
                7701003690
                773601001
            
            CU
        
        
            EP111
            Закупка у единственного поставщика (подрядчика, исполнителя) с учетом положений ст. 111 Закона № 44-ФЗ
        
        
            
                1
                400000
                
                    RUB
                    Российский рубль
                
                
                    35.30.11.111
                
                191770100369077360100100100013530000
                
                    2019037310004350010001
                    2019037310004350010000300001
                
                false
            
        
        п.8, ч.1, ст.93 44ФЗ

Mads Hansen · Accepted Answer

It is easy to overlook, but in the second document the element is bound to the namespace http://zakupki.gov.ru/oos/types/1.

If you look at the first element, you will see that namespace is declared without a prefix: xmlns="http://zakupki.gov.ru/oos/types/1"

If you want to select any element with a local-name() of id regardless of what the namespace is, you could change your XPath to:

//*[local-name() = 'id']

and with XPath 2.0 or greater, you could use a wildcard for the namespace:

//*:id

xpath work on one file, but do not work with another

Answers (2)

Related Questions