S Andrew
S Andrew

Reputation: 7198

How to read data from xml file in python

I have below xml file data:

<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<rootnode>
  <TExportCarcass>
    <BodyNum>6168</BodyNum>
    <BodyWeight>331.40</BodyWeight>
    <UnitID>1</UnitID>
    <Plant>239</Plant>
    <pieces>
      <TExportCarcassPiece index="0">
        <Bruising>0</Bruising>
        <RFIDPlant></RFIDPlant>
      </TExportCarcassPiece>
      <TExportCarcassPiece index="1">
        <Bruising>0</Bruising>
        <RFIDPlant></RFIDPlant>
      </TExportCarcassPiece>
    </pieces>
  </TExportCarcass>
  <TExportCarcass>
    <BodyNum>6169</BodyNum>
    <BodyWeight>334.40</BodyWeight>
    <UnitID>1</UnitID>
    <Plant>278</Plant>
    <pieces>
      <TExportCarcassPiece index="0">
        <Bruising>0</Bruising>
        <RFIDPlant></RFIDPlant>
      </TExportCarcassPiece>
      <TExportCarcassPiece index="1">
        <Bruising>0</Bruising>
        <RFIDPlant></RFIDPlant>
      </TExportCarcassPiece>
    </pieces>
  </TExportCarcass>
</rootnode>

I am using python's lxml module to read data from xml file like below:

from lxml import etree

doc = etree.parse('file.xml')

memoryElem = doc.find('BodyNum')
print(memoryElem)        

But its only printing None instead of 6168. Please suggest what I am doing wrong here.

Upvotes: 3

Views: 8470

Answers (5)

O Yahya
O Yahya

Reputation: 376

1 - Use / to specify the tree level of the element you want to extract

2 - Use .text to extract the name of the elemnt

doc = etree.parse('file.xml')
memoryElem = doc.find("*/BodyNum") #BodyNum is one level down
print(memoryElem.text)  #Specify you want to extract the name of the element

Upvotes: 2

moebius
moebius

Reputation: 2259

When you run find on a text string, it will only search for elements at the root level. You can instead use xpath queries within find to search for any element within the doc:

  1. To get the first element only:
from lxml import etree
doc = etree.parse('file.xml')

memoryElem = doc.find('.//BodyNum')
memoryElem.text
# 6168
  1. To get all elements:
[ b.text for b in doc.iterfind('.//BodyNum') ]
# ['6168', '6169']

Upvotes: 2

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Your document contains multiple BodyNum elements.
You need to put an explicit limit into a query if you need only the 1st element.

Use the following flexible approach based on xpath query:

from lxml import etree

doc = etree.parse('file.xml')
memoryElem = doc.xpath('(//BodyNum)[1]/text()')
print(memoryElem)   # ['6168']

Upvotes: 0

Rakesh
Rakesh

Reputation: 82755

You need to iterate each TExportCarcass tag and then use find to access BodyNum

Ex:

from lxml import etree

doc = etree.parse('file.xml')
for elem in doc.findall('TExportCarcass'):
    print(elem.find("BodyNum").text) 

Output:

6168
6169

or

print([i.text for i in doc.findall('TExportCarcass/BodyNum')]) #-->['6168', '6169']

Upvotes: 2

Faizan Naseer
Faizan Naseer

Reputation: 627

Just use the inbuild xml.etree.Etree module of python

https://docs.python.org/3/library/xml.etree.elementtree.html

Upvotes: 0

Related Questions